draft-ietf-mhtml-spec-02.txt   draft-ietf-mhtml-spec-03.txt 
Network Working Group Jacob Palme Network Working Group Jacob Palme
Internet Draft Stockholm University/KTH Internet Draft Stockholm University/KTH
draft-ietf-mhtml-spec-02.txt Alexander Hopmann draft-ietf-mhtml-spec-03.txt Alexander Hopmann
Category-to-be: Proposed standard ResNova Software, Inc. Category-to-be: Proposed standard ResNova Software, Inc.
Expires: February 1997 August 1996
MIME E-mail Encapsulation of Aggregate Documents, such as HTML (MHTML) MIME E-mail Encapsulation of Aggregate Documents, such as HTML (MHTML)
Status of this Document Status of this Document
This document is an Internet-Draft. Internet-Drafts are working This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, and documents of the Internet Engineering Task Force (IETF), its areas, and
its working groups. Note that other groups may also distribute working its working groups. Note that other groups may also distribute working
documents as Internet-Drafts. documents as Internet-Drafts.
skipping to change at page 1, line 36 skipping to change at page 1, line 37
Abstract Abstract
Although HTML [RFC 1866] was designed within the context of MIME, more Although HTML [RFC 1866] was designed within the context of MIME, more
than the specification of HTML as defined in RFC 1866 is needed for two than the specification of HTML as defined in RFC 1866 is needed for two
electronic mail user agents to be able to interoperate using HTML as a electronic mail user agents to be able to interoperate using HTML as a
document format. These issues include the naming of objects that are document format. These issues include the naming of objects that are
normally referred to by URIs, and the means of aggregating objects that normally referred to by URIs, and the means of aggregating objects that
go together. This document describes a set of guidelines that will allow go together. This document describes a set of guidelines that will allow
conforming mail user agents to be able to send, deliver and display conforming mail user agents to be able to send, deliver and display
these objects, such as HTML objects, that can contain links represdented these objects, such as HTML objects, that can contain links represented
by URIs. In order to be able to handle inter-linked objects, the by URIs. In order to be able to handle inter-linked objects, the
document proposes to use the MIME type multipart/related and specifies document proposes to use the MIME type multipart/related and specifies
the MIME content-headers "Content-Location" and "Content-Base". the MIME content-headers "Content-Location" and "Content-Base".
Table of Contents Table of Contents
1. Introduction 1. Introduction
2. Terminology 2. Terminology
2.1 Conformance requirement terminology 2.1 Conformance requirement terminology
2.2 Other terminology 2.2 Other terminology
skipping to change at page 2, line 29 skipping to change at page 2, line 29
8. Format of Links to Other Body Parts 8. Format of Links to Other Body Parts
8.1 General principle 8.1 General principle
8.2 Use of the Content-Location header 8.2 Use of the Content-Location header
8.3 Use of the Content-ID header and CID URLs 8.3 Use of the Content-ID header and CID URLs
9 Examples 9 Examples
9.1 Example of a HTML body without included linked objects 9.1 Example of a HTML body without included linked objects
9.3 Example with relative URIs to an embedded GIF picture 9.3 Example with relative URIs to an embedded GIF picture
9.4 Example using CID URL and Content-ID header to an embedded GIF 9.4 Example using CID URL and Content-ID header to an embedded GIF
picture picture
10. Content-Disposition header 10. Content-Disposition header
11. Character encoding issues 11. Character encoding issues and end-of-line issues
11.1 Character set issues
11.2 Line break characters
12. Security Considerations 12. Security Considerations
13. Acknowledgments 13. Acknowledgments
14. References 14. References
15. Author's Address 15. Author's Address
Mailing List Information Mailing List Information
Further discussion on this document should be done through the mailing Further discussion on this document should be done through the mailing
list MHTML@SEGATE.SUNET.SE. list MHTML@SEGATE.SUNET.SE.
skipping to change at page 7, line 23 skipping to change at page 6, line 45
Content-Type: Multipart/related; boundary="boundary-example-1"; Content-Type: Multipart/related; boundary="boundary-example-1";
type=Text/HTML; start=foo2*foo3@bar2.net type=Text/HTML; start=foo2*foo3@bar2.net
; A Content-Base header cannot be placed here, since this is a ; A Content-Base header cannot be placed here, since this is a
; multipart MIME object. ; multipart MIME object.
--boundary-example-1 --boundary-example-1
Part 1: Part 1:
Content-Type: Text/HTML; charset=US-ASCII Content-Type: Text/HTML; charset=US-ASCII
Content-ID: foo2*foo3@bar2.net Content-ID: foo2*foo3@bar2.net
Content-Location: "http/www.ietf.cnir.reston.va.us/images/foo1.bar1" Content-Location: "http/www.ietf.cnir.reston.va.us/images/foo1.bar1";
;
; This Content-Location must contain an absolute URI, since no base ; This Content-Location must contain an absolute URI, since no base
; is valid here. ; is valid here.
--boundary-example-1 --boundary-example-1
Part 2: Part 2:
Content-Type: Text/HTML; charset=US-ASCII Content-Type: Text/HTML; charset=US-ASCII
Content-ID: foo4*foo5@bar2.net Content-ID: foo4*foo5@bar2.net
Content-Location: "foo1.bar1" ; The Content-Base below applies to Content-Location: "foo1.bar1" ; The Content-Base below applies to
; this relative URI ; this relative URI
skipping to change at page 9, line 5 skipping to change at page 8, line 17
URIs. For example, HTML provides the BASE element for this. URIs. For example, HTML provides the BASE element for this.
(b) There is a Content-Base header (as defined in section 4.2), (b) There is a Content-Base header (as defined in section 4.2),
specifying the base to be used. specifying the base to be used.
(c) There is a Content-Location header in the heading of the body (c) There is a Content-Location header in the heading of the body
part which can then serve as the base in the same way as the part which can then serve as the base in the same way as the
request URI can serve as a base for relative URIs within a file request URI can serve as a base for relative URIs within a file
retrieved via HTTP [HTTP]. retrieved via HTTP [HTTP].
When the methods above do not yield an absolute URI the procedure in
section 8.2 for matching relative URIs MUST be followed.
6. Sending documents without linked objects 6. Sending documents without linked objects
If a document, such as an HTML object, is sent without other objects, to If a document, such as an HTML object, is sent without other objects, to
which it is linked, it MAY be sent as a Text/HTML body part by itself. which it is linked, it MAY be sent as a Text/HTML body part by itself.
In this case, multipart/related need not be used. In this case, multipart/related need not be used.
Such a document may either not include any links, or contain links which Such a document may either not include any links, or contain links which
the recipient resolves via ordinary net look up, or contain links which the recipient resolves via ordinary net look up, or contain links which
the recipient cannot resolve. the recipient cannot resolve.
skipping to change at page 11, line 25 skipping to change at page 10, line 37
If there is a Content-Base header, then the recipient MUST employ If there is a Content-Base header, then the recipient MUST employ
relative to absolute resolution as defined in RFC 1808 [RELURL] of relative to absolute resolution as defined in RFC 1808 [RELURL] of
relative URIs in both the HTML markup and the Content-Location header relative URIs in both the HTML markup and the Content-Location header
before matching a hyperlink in the HTML markup to a Content-Location before matching a hyperlink in the HTML markup to a Content-Location
header. The same applies if the Content-Location contains an absolute header. The same applies if the Content-Location contains an absolute
URI, and the HTML markup contains a BASE element so that relative URIs URI, and the HTML markup contains a BASE element so that relative URIs
in the HTML markup can be resolved. in the HTML markup can be resolved.
If there is NO Content-Base header, and the Content-Location header If there is NO Content-Base header, and the Content-Location header
contains a relative URI, then NO relative to absolute resolution SHOULD contains a relative URI, then NO relative to absolute resolution SHOULD
be performed (even if there is a BASE specification, such as the BASE be performed when matching Content-Location headers (even if there is a
element in HTML, in the body part containing the URI), and exact textual BASE specification, such as the BASE element in HTML, in the body part
match of the relative URI-s in the Content-Location and the HTML markup containing the URI), and exact textual match of the relative URI-s in
is performed instead (after removal of LWSP introduced as described in the Content-Location and the HTML markup is performed instead (after
section 4.4 above). removal of LWSP introduced as described in section 4.4 above). Note that
this only applies for matching Content-Location headers, not for URL-s
in the HTML document which are resolved through network look up at read
time.
If there is NO Content-Base header, and the Content-Location header
contains a relative URI, then NO relative to absolute resolution SHOULD
be performed. Matching the relative URI in the Content-Location header
to a hyperlink in an HTML markup text is in this case a two step
process. First remove any LWSP from the relative URI which may have been
introduced as described in section 4.4. Then perform an exact textual
match against the HTML URIs. For this matching process, ignore BASE
specifications, such as the BASE element in HTML. Note that this only
applies for matching Content-Location headers, not for URL-s in the HTML
document which are resolved through network look up at read time.
The URI in the Content-Location header need not refer to an object which The URI in the Content-Location header need not refer to an object which
is actually available globally for retrieval using this URI (after is actually available globally for retrieval using this URI (after
resolution of relative URIs). However, URI-s in Content-Location headers resolution of relative URIs). However, URI-s in Content-Location headers
(if absolute, or resolvable to absolute URIs) SHOULD still be globally (if absolute, or resolvable to absolute URIs) SHOULD still be globally
unique. unique.
8.3 Use of the Content-ID header and CID URLs 8.3 Use of the Content-ID header and CID URLs
When CID (Content-ID) URLs as defined in RFC 1738 [URL] and RFC 1873 When CID (Content-ID) URLs as defined in RFC 1738 [URL] and RFC 1873
skipping to change at page 13, line 25 skipping to change at page 12, line 44
type=Text/HTML type=Text/HTML
--boundary-example 1 --boundary-example 1
Content-Type: Text/HTML; charset=ISO-8859-1 Content-Type: Text/HTML; charset=ISO-8859-1
Content-Transfer-Encoding: QUOTED-PRINTABLE Content-Transfer-Encoding: QUOTED-PRINTABLE
... text of the HTML document, which might contain a hyperlink ... text of the HTML document, which might contain a hyperlink
to the other body part, for example through a statement such as: to the other body part, for example through a statement such as:
<IMG SRC="/images/ietflogo.gif" ALT="IETF logo"> <IMG SRC="/images/ietflogo.gif" ALT="IETF logo">
Example of a copyright sign encoded with Quoted-Printable: =A9 Example of a copyright sign encoded with Quoted-Printable: =A9
Example of a copyright sign mapped onto HTML markup: & 168; Example of a copyright sign mapped onto HTML markup: &#168;
--boundary-example-1 --boundary-example-1
Content-Location: "/images/ietflogo.gif" Content-Location: "/images/ietflogo.gif"
Content-Type: IMAGE/GIF Content-Type: IMAGE/GIF
Content-Transfer-Encoding: BASE64 Content-Transfer-Encoding: BASE64
R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5 R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5
NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A
etc... etc...
skipping to change at page 14, line 15 skipping to change at page 13, line 38
NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A
etc... etc...
--boundary-example-1-- --boundary-example-1--
10. Content-Disposition header 10. Content-Disposition header
Note the specification in [REL] on the relations between Content- Note the specification in [REL] on the relations between Content-
Disposition and multipart/related. Disposition and multipart/related.
11. Character encoding issues 11. Character encoding issues and end-of-line issues
11.1 Character set issues
For the encoding of characters in an HTML document into a MIME- For the encoding of characters in HTML documents and other text
compatible octet stream, the following three mechanisms are relevant: documents into a MIME-compatible octet stream, the following mechanisms
are relevant:
- HTML [HTML2] as an application of SGML [SGML] allows characters to be - HTML [HTML2, HTML-I18N] as an application of SGML [SGML] allows
denoted by character entities as well as by numeric character characters to be denoted by character entities as well as by numeric
references character references (e.g. "Latin small letter a with acute accent"
(e.g. "latin small letter a with acute" may be represented by may
"&aacute;" be represented by "&aacute;" or "&#225;") in the HTML markup.
or "& 225;") in the HTML markup.
- HTML documents, in common with other documents of the MIME content- - HTML documents, in common with other documents of the MIME content-
type type
text, can use various kinds of character encodings which are indicated "text", can be represented in MIME using one of several character
by the value of the "charset" parameter in the MIME content-type encodings. The MIME content-type "charset" parameter value indicates
header. the particular encoding used. For the exact meaning and use of the
For the exact meaning and use of the "charset" parameter, please see "charset" parameter, please see [MIME-IMB section 4.2].
[MIME1 section 7.1.1]. Note that the "charset" parameter refers to the
charset in the HTML markup, not to the charset in the displayed text. Note that the "charset" parameter refers only to the MIME character
Thus, if the HTML markup contains only US-ASCII characters, then the encoding. For example, the string "&aacute;" can be sent in MIME with
value of the charset parameter should be US-ASCII, even if the HTML "charset=US-ASCII", while the raw character "Latin small letter a with
markup contains entities which cause the displayed text to show acute accent" cannot.
non-US-ASCII-characters.
The above mechanisms are well defined and documented, and therefore not
further explained here. In sending a message, all the above mentioned
mechanisms MAY be used, and any mixture of them MAY occur when sending
the document via e-mail. Receiving mail user agents (together with the
Web browser they may use to display the document) MUST be capable of
handling any combinations of these mechanisms.
Also note that:
- Any documents including HTML documents that contain octet values - Any documents including HTML documents that contain octet values
outside outside
the 7-bit range or that contain bare CRs or bare LFs need a content- the 7-bit range need a content-transfer-encoding applied before
transfer-encoding applied before transmission over certain transport transmission over certain transport protocols [MIME1, chapter 5].
protocols [MIME1, chapter 5].
The above three mechanisms are well defined and documented, and - The MIME standard [MIME1] requires that documents of "Content-
therefore not further explained here. In sending a message, all the Type:text"
abovementioned mechanisms MAY be used, and any mixture of them MAY occur MUST be in canonical form before Content-Transfer-Encoding, i.e. that
when sending the document via e-mail. Receiving mailers (together with line breaks are encoded as CRLFs, not as bare CRs or bare LFs or
the Web browser they may use to display the document) MUST be capable of something else. This is in contrast to [HTTP] where section 3.6.1
handling any combinations of these mechanisms. allows other representations of line breaks.
Some transport mechanisms may specify a default "charset" parameter if Note that this might cause problems with integrity checks based on
none is suppled [HTTP, MIME1]. Because the default differs for different checksums, which might not be preserved when moving a document from the
mechanisms, when HTML is transferred through mail, the charset parameter HTTP to the MIME environment. If a document has to be converted in such
SHOULD be included, rather than relying on the default. a way that a checksum integrity check becomes invalid, then this
integrity check header SHOULD be removed from the document.
Example of non-US-ASCII characters in HTML: See section 9.3 above. Other sources of problems are "Content-Encoding" used in HTTP but not
allowed in MIME, and "charsets that are not able to represent line
breaks as CRLF. A good overview of the differences between HTTP and MIME
with regards to Content-Type:text" can be found in [HTTP], appendix C.
11.2 Line break characters If the original document has line breaks in the canonical form (CRLF),
then the document SHOULD remain unconverted so that integrity check sums
are not invalidated.
The MIME standard [MIME1] specifies that line breaks in the MIME content A provider of HTML documents who wants his documents to be transferable
MUST be CRLF. The HTTP standard [HTTP] specifies that line breaks in via both HTTP and SMTP without invalidating checksum integrity checks,
transported HTML markup may be either bare CRs, bare LFs or CRLFs. To should always provide original documents in the canonical form with CRLF
allow data integrity checks through checksums, MIME content-transfer- for line breaks.
encoding of line breaks SHOULD, if necessary, be used so that after
decoding, the line break representation of the original HTML markup is
returned.
Note that since the mail content-MD5 is defined to a canonical form with Some transport mechanisms may specify a default "charset" parameter if
all line breaks converted to CRLF, while the HTTP content-MD5 is defined none is supplied [HTTP, MIME1]. Because the default differs for
to apply to the transmitted form. This means that the Content-MD5 HTTP different mechanisms, when HTML is transferred through mail, the charset
header may not be correct for Text/HTML that is retrieved from a HTTP parameter SHOULD be included, rather than relying on the default.
server and then sent via mail.
12. Security Considerations 12. Security Considerations
Some Security Considerations include the potential to mail someone an Some Security Considerations include the potential to mail someone an
object, and claim that it is represented by a particular URI (by giving object, and claim that it is represented by a particular URI (by giving
it a Content-Location: header). There can be no assurance that a WWW it a Content-Location: header). There can be no assurance that a WWW
request for that same URI would normally result in that same object. It request for that same URI would normally result in that same object. It
might be unsuitable to cache the data in such a way that the cached data might be unsuitable to cache the data in such a way that the cached data
can be used for retrieval of this URI from other messages or message can be used for retrieval of this URI from other messages or message
parts than those included in the same message as the Content-Location parts than those included in the same message as the Content-Location
skipping to change at page 16, line 40 skipping to change at page 16, line 20
[CONDISP] R. Troost, S. Dorner: "Communicating Presentation [CONDISP] R. Troost, S. Dorner: "Communicating Presentation
Information in Internet Messages: The Content- Information in Internet Messages: The Content-
Disposition Header", RFC 1806, June 1995. Disposition Header", RFC 1806, June 1995.
[HOSTS] R. Braden (editor): "Requirements for Internet Hosts -- [HOSTS] R. Braden (editor): "Requirements for Internet Hosts --
Application and Support", STD-3, RFC 1123, October 1989. Application and Support", STD-3, RFC 1123, October 1989.
[HTML2] T. Berners-Lee, D. Connolly: "Hypertext Markup Language [HTML2] T. Berners-Lee, D. Connolly: "Hypertext Markup Language
- 2.0", RFC 1866, November 1995. - 2.0", RFC 1866, November 1995.
[HTML-I18N] F. Yergeau, G. Nicol, G. Adams, & M. Duerst:
"Internationalization of the Hypertext Markup
Language". draft-ietf-html-i18n-04.txt, May 1996.
[HTTP] T. Berners-Lee, R. Fielding, H. Frystyk: Hypertext [HTTP] T. Berners-Lee, R. Fielding, H. Frystyk: Hypertext
Transfer Protocol -- HTTP/1.0. RFC 1945, May 1996. Transfer Protocol -- HTTP/1.0. RFC 1945, May 1996.
[MD5] R. Rivest, "The MD5 Message-Digest Algorithm", RFC 1321,
April 1992.
[MIDCID] E. Levinson: "Message/External-Body Content-ID Access [MIDCID] E. Levinson: "Message/External-Body Content-ID Access
Type", RFC 1873, December 1995. Type", draft-ietf-mhtml-cid-00.txt, August 1996.
[MIME1] N. Borenstein & N. Freed: "MIME (Multipurpose Internet [MIME1] N. Borenstein & N. Freed: "MIME (Multipurpose Internet
Mail Extensions) Part One: Mechanisms for Specifying and Mail Extensions) Part One: Mechanisms for Specifying and
Describing the Format of Internet Message Bodies", RFC Describing the Format of Internet Message Bodies", RFC
1521, Sept 1993. 1521, Sept 1993.
[MIME2] N. Borenstein & N. Freed: "Multipurpose Internet Mail [MIME2] N. Borenstein & N. Freed: "Multipurpose Internet Mail
Extensions (MIME) Part Two: Media Types". draft-ietf- Extensions (MIME) Part Two: Media Types". draft-ietf-
822ext-mime-imt-02.txt, December 1995. 822ext-mime-imt-02.txt, December 1995.
[MIME-IMB] N. Freed & N. Borenstein: "Multipurpose Internet Mail
Extensions (MIME) Part One: Format of Internet Message
Bedies". draft-ietf-822ext-mime-imb-07.txt, June 1996.
[NEWS] M.R. Horton, R. Adams: "Standard for interchange of [NEWS] M.R. Horton, R. Adams: "Standard for interchange of
USENET messages", RFC 1036, December 1987. USENET messages", RFC 1036, December 1987.
[PDF] Bienz, T., Cohn, R. and Meehan, J.: "Portable Document [PDF] Bienz, T., Cohn, R. and Meehan, J.: "Portable Document
Format Reference Manual, Version 1.1", Adboe Systems Format Reference Manual, Version 1.1", Adboe Systems
Inc. Inc.
[REL] Harald Tveit Alvestrand, Edward Levinson: "The MIME [REL] Harald Tveit Alvestrand, Edward Levinson: "The MIME
Multipart/Related Content-type", <draft-levinson- Multipart/Related Content-type", <draft-ietf-mhtml-
multipart-related-00.txt>, January 1995. related-00.txt>, May 1995.
[RELURL] R. Fielding: "Relative Uniform Resource Locators", RFC [RELURL] R. Fielding: "Relative Uniform Resource Locators", RFC
1808, June 1995. 1808, June 1995.
[RFC822] D. Crocker: "Standard for the format of ARPA Internet [RFC822] D. Crocker: "Standard for the format of ARPA Internet
text messages." STD 11, RFC 822, August 1982. text messages." STD 11, RFC 822, August 1982.
[SGML] ISO 8879. Information Processing -- Text and Office - [SGML] ISO 8879. Information Processing -- Text and Office -
Standard Generalized Markup Language (SGML), Standard Generalized Markup Language (SGML),
1986. <URL:http://www.iso.ch/cate/d16387.html> 1986. <URL:http://www.iso.ch/cate/d16387.html>
skipping to change at page 17, line 45 skipping to change at page 17, line 38
Alex Hopmann. Alex Hopmann.
Jacob Palme Phone: +46-8-16 16 67 Jacob Palme Phone: +46-8-16 16 67
Stockholm University and KTH Fax: +46-8-783 08 29 Stockholm University and KTH Fax: +46-8-783 08 29
Electrum 230 E-mail: jpalme@dsv.su.se Electrum 230 E-mail: jpalme@dsv.su.se
S-164 40 Kista, Sweden S-164 40 Kista, Sweden
Alex Hopmann Alex Hopmann
President President
ResNova Software, Inc. E-mail: alex.hopmann@resnova.com ResNova Software, Inc. E-mail: alex.hopmann@resnova.com
5011 Argosy Dr. 13 5011 Argosy Dr. #13
Huntington Beach, CA 92649 Huntington Beach, CA 92649
Working group chairman: Einar Stefferud <stef@nma.com> Working group chairman:
Einar Stefferud <stef@nma.com>
 End of changes. 27 change blocks. 
65 lines changed or deleted 99 lines changed or added

This html diff was produced by rfcdiff 1.34. The latest version is available from http://tools.ietf.org/tools/rfcdiff/