draft-ietf-mhtml-spec-01.txt   draft-ietf-mhtml-spec-02.txt 
Network Working Group Jacob Palme Network Working Group Jacob Palme
Internet Draft Stockholm University/KTH Internet Draft Stockholm University/KTH
draft-ietf-mhtml-spec-01.txt Alexander Hopmann draft-ietf-mhtml-spec-02.txt Alexander Hopmann
Category-to-be: Proposed standard ResNova Software, Inc. Category-to-be: Proposed standard ResNova Software, Inc.
MIME E-mail Encapsulation of Aggregate HTML Documents (MHTML) MIME E-mail Encapsulation of Aggregate Documents, such as HTML (MHTML)
Status of this Document Status of this Document
This document is an Internet-Draft. Internet-Drafts are working This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, and documents of the Internet Engineering Task Force (IETF), its areas, and
its working groups. Note that other groups may also distribute working its working groups. Note that other groups may also distribute working
documents as Internet-Drafts. documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
skipping to change at page 1, line 36 skipping to change at page 1, line 36
Abstract Abstract
Although HTML [RFC 1866] was designed within the context of MIME, more Although HTML [RFC 1866] was designed within the context of MIME, more
than the specification of HTML as defined in RFC 1866 is needed for two than the specification of HTML as defined in RFC 1866 is needed for two
electronic mail user agents to be able to interoperate using HTML as a electronic mail user agents to be able to interoperate using HTML as a
document format. These issues include the naming of objects that are document format. These issues include the naming of objects that are
normally referred to by URIs, and the means of aggregating objects that normally referred to by URIs, and the means of aggregating objects that
go together. This document describes a set of guidelines that will allow go together. This document describes a set of guidelines that will allow
conforming mail user agents to be able to send, deliver and display conforming mail user agents to be able to send, deliver and display
these HTML objects. In addition it is hoped that these techniques will these objects, such as HTML objects, that can contain links represdented
also apply to the wider category of URI-enabled objects. In order to do by URIs. In order to be able to handle inter-linked objects, the
this, the document specifies the MIME content-headers "Content-Location" document proposes to use the MIME type multipart/related and specifies
and "Content-Base". the MIME content-headers "Content-Location" and "Content-Base".
Table of Contents Table of Contents
1. Introduction 1. Introduction
2. Terminology 2. Terminology
2.1 Conformance requirement terminology 2.1 Conformance requirement terminology
2.2 Other terminology 2.2 Other terminology
4. The Content-Location and Content-Base MIME Content Headers 4. The Content-Location and Content-Base MIME Content Headers
4.1 MIME content headers 4.1 MIME content headers
4.2 The Content-Base header 4.2 The Content-Base header
4.3 The Content-Location Header 4.3 The Content-Location Header
4.4 Encoding of URIs in e-mail headers 4.4 Encoding of URIs in e-mail headers
5. Base URIs for resolution of relative URIs 5. Base URIs for resolution of relative URIs
6. Sending HTML documents without linked objects 6. Sending documents without linked objects
7. Use of the Content-Type: Multipart/related 7. Use of the Content-Type: Multipart/related
8. Format of Links to Other Body Parts 8. Format of Links to Other Body Parts
8.1 General principle 8.1 General principle
8.2 Use of the Content-Location header 8.2 Use of the Content-Location header
8.3 Use of the Content-ID header and CID URLs 8.3 Use of the Content-ID header and CID URLs
9 Examples 9 Examples
9.1 Example of a HTML body without included linked objects 9.1 Example of a HTML body without included linked objects
9.3 Example with relative URIs to an embedded GIF picture 9.3 Example with relative URIs to an embedded GIF picture
9.4 Example using CID URL and Content-ID header to an embedded GIF 9.4 Example using CID URL and Content-ID header to an embedded GIF
picture picture
10. Content-Disposition header 10. Content-Disposition header
11. Encoding Considerations for HTML bodies 11. Character encoding issues
11.1 Character set issues 11.1 Character set issues
11.2 Line break characters 11.2 Line break characters
12. Security Considerations 12. Security Considerations
13. Acknowledgments 13. Acknowledgments
14. References 14. References
15. Author's Address 15. Author's Address
Mailing List Information Mailing List Information
Further discussion on this document should be done through the mailing Further discussion on this document should be done through the mailing
skipping to change at page 2, line 58 skipping to change at page 3, line 26
FTP://SEGATE.SUNET.SE/lists/mHTML/ FTP://SEGATE.SUNET.SE/lists/mHTML/
The archives are also available by e-mail. Send a message to The archives are also available by e-mail. Send a message to
LISTSERV@SEGATE.SUNET.SE with the text "INDEX MHTML" to get a list of LISTSERV@SEGATE.SUNET.SE with the text "INDEX MHTML" to get a list of
the archive files, and then a new message "GET <file name>" to retrieve the archive files, and then a new message "GET <file name>" to retrieve
the archive files. the archive files.
Comments on less important details may also be sent to the editor, Jacob Comments on less important details may also be sent to the editor, Jacob
Palme <jpalme@dsv.su.se>. Palme <jpalme@dsv.su.se>.
More information may also be available at URL: More information may also be available at URL:
HTTP://www.dsv.su.se/~jpalme/ietf/jp-ietf-home.HTML> HTTP://www.dsv.su.se/~jpalme/ietf/jp-ietf-home.HTML
1. Introduction 1. Introduction
The HTML format is a very common format for documents in the Internet, There are a number of document formats, HTML [HTML2], PDF [PDF] and VRML
and there is an obvious need to be able to send documents in this format for example, which provide links using URIs for their resolution. There
in e-mail [RFC821=SMTP, RFC822]. The "text/html" media type is defined is an obvious need to be able to send documents in these formats in e-
in RFC 1866 [HTML2]. This document gives additional specifications on mail [RFC821=SMTP, RFC822]. This document gives additional
how to use the text/html media type as a Content-Type in MIME [RFC specifications on how to send such documents in MIME [RFC 1521=MIME1] e-
1521=MIME1] e-mail messages. HTML documents commonly include links to mail messages. This version of this standard was based on full
other objects and resources, either embedded or directly accessible consideration only of the needs for objects with links in the Text/HTML
through hypertext links. When mailing a HTML document, it is often media type (as defined in RFC 1866 [HTML2]), but the standard may still
desirable to also mail all of the additional resources that are be applicable also to other formats for sets of interlinked objects,
referenced in it; those elements are necessary for the complete linked by URIs. There is no conformance requirement that implementations
interpretation of the HTML. claiming conformance to this standard are able to handle URI-s in other
document formats than HTML.
An alternative way for sending HTML documents in e-mail is to only send URIs in documents in HTML and other similar formats reference other
the URL, and let the recipient look up the document using HTTP. That objects and resources, either embedded or directly accessible through
method is described in [URLBODY] and is not described in this document. hypertext links. When mailing such a document, it is often desirable to
also mail all of the additional resources that are referenced in it;
those elements are necessary for the complete interpretation of the
primary object.
An alternative way for sending an HTML document or other object
containing URIs in e-mail is to only send the URL, and let the recipient
look up the document using HTTP. That method is described in [URLBODY]
and is not described in this document.
2. Terminology 2. Terminology
2.1 Conformance requirement terminology 2.1 Conformance requirement terminology
This specification uses the same words as RFC 1123 [HOSTS] for defining This specification uses the same words as RFC 1123 [HOSTS] for defining
the significance of each particular requirement. These words are: the significance of each particular requirement. These words are:
MUST This word or the adjective "required" means that the item is MUST This word or the adjective "required" means that the item is
an absolute requirement of the specification. an absolute requirement of the specification.
skipping to change at page 4, line 22 skipping to change at page 4, line 50
CID See [MIDCID]. CID See [MIDCID].
Content-Base See section 4.2 below. Content-Base See section 4.2 below.
Content-ID See [MIDCID]. Content-ID See [MIDCID].
Content-Location MIME message or content part header with the URI of Content-Location MIME message or content part header with the URI of
the MIME message or content part body, defined in the MIME message or content part body, defined in
section 4.3 below. section 4.3 below.
Content-Transfer- Conversion of a text into 7-bit octets as specified
Encoding in [MIME1].
CR See [RFC822]. CR See [RFC822].
CRLF See [RFC822]. CRLF See [RFC822].
Displayed text The text shown to the user reading a document with
a web browser. This may be different from the HTML
markup, see the definition of HTML markup below.
Header Field in a message or content heading specifying Header Field in a message or content heading specifying
the value of one attribute. the value of one attribute.
Heading Part of a message or content before the first Heading Part of a message or content before the first
CRLFCRLF, containing formatted fields with CRLFCRLF, containing formatted fields with
attributes of the message or content. attributes of the message or content.
HTML See RFC 1866 [HTML2]. HTML See RFC 1866 [HTML2].
HTML Aggregate HTML objects together with some or all objects, to HTML Aggregate HTML objects together with some or all objects, to
objects which the HTML object contains hyperlinks. objects which the HTML object contains hyperlinks.
HTML markup A file containing HTML encodings as specified in
[HTML] which may be different from the displayed
text which a person using a web browser sees. For
example, the HTML markup may contain "&lt;" where
the displayed text contains the character "<".
LF See [RFC822]. LF See [RFC822].
MIC Message Integrity Codes, codes use to verify that a MIC Message Integrity Codes, codes use to verify that a
message has not been illegally modified. message has not been illegally modified.
MIME See RFC 1521 [MIME1], [MIME2]. MIME See RFC 1521 [MIME1], [MIME2].
MUA Messaging User Agent. MUA Messaging User Agent.
PDF Portable Document Format, see [PDF].
Relative URI, See RFC 1866 [HTML2] and RFC 1808[RELURL]. Relative URI, See RFC 1866 [HTML2] and RFC 1808[RELURL].
RelativeURI RelativeURI
URI, absolute and See RFC 1866 [HTML2]. URI, absolute and See RFC 1866 [HTML2].
relative relative
URL See RFC 1738 [URL]. URL See RFC 1738 [URL].
URL, relative See [RELURL]. URL, relative See [RELURL].
VRML Virtual Reality Markup Language.
3. Overview 3. Overview
An aggregate HTML object is a MIME-encoded message that contains a root An aggregate document is a MIME-encoded message that contains a root
document as well as other data that is required in order to represent document as well as other data that is required in order to represent
that document (inline pictures, style sheets, applets, etc.). Aggregate that document (inline pictures, style sheets, applets, etc.). Aggregate
HTML objects can also include additional elements that are linked to the documents can also include additional elements that are linked to the
first object. It is important to keep in mind the differing needs of first object. It is important to keep in mind the differing needs of
several audiences. Mail sending agents might send aggregate HTML objects several audiences. Mail sending agents might send aggregate documents as
as an encoding of normal day-to-day electronic mail. Mail sending agents an encoding of normal day-to-day electronic mail. Mail sending agents
might also send aggregate HTML objects when a user wishes to mail a might also send aggregate documents when a user wishes to mail a
particular document from the web to someone else. Finally mail sending particular document from the web to someone else. Finally mail sending
agents might send aggregate HTML documents as automatic responders, agents might send aggregate documents as automatic responders, providing
providing access to WWW resources for non-IP connected clients. access to WWW resources for non-IP connected clients.
Mail receiving agents also have several differing needs. Some mail Mail receiving agents also have several differing needs. Some mail
receiving agents might be able to receive an aggregate HTML document and receiving agents might be able to receive an aggregate document and
display it just as any other text content type would be displayed. display it just as any other text content type would be displayed.
Others might have to pass this aggregate HTML document to an HTML Others might have to pass this aggregate document to a browsing program,
browsing program, and provisions need to be made to make this possible. and provisions need to be made to make this possible.
Finally several other constraints on the problem arise. It is important Finally several other constraints on the problem arise. It is important
that it be possible for an HTML document to be signed and for it to be that it be possible for a document to be signed and for it to be able to
able to be transmitted to a client and displayed with a minimum risk of be transmitted to a client and displayed with a minimum risk of breaking
breaking the message integrity (MIC) check that is part of the the message integrity (MIC) check that is part of the signature.
signature.
4. The Content-Location and Content-Base MIME Content Headers 4. The Content-Location and Content-Base MIME Content Headers
4.1 MIME content headers 4.1 MIME content headers
In order to resolve URI references to other body parts, two MIME content In order to resolve URI references to other body parts, two MIME content
headers are defined, Content-Location and Content-Base. Both these headers are defined, Content-Location and Content-Base. Both these
headers can occur in any message or content heading, and will then be headers can occur in any message or content heading, and will then be
valid within this heading and for its content. valid within this heading and for its content.
skipping to change at page 6, line 6 skipping to change at page 6, line 53
where URI is at present (June 1996) restricted to the syntax for URLs as where URI is at present (June 1996) restricted to the syntax for URLs as
defined in RFC 1738 [URL]. defined in RFC 1738 [URL].
These two headers are valid only for exactly the content heading or These two headers are valid only for exactly the content heading or
message heading where they occurs and its text. They are thus not valid message heading where they occurs and its text. They are thus not valid
for the parts inside multipart headings, and are thus meaningless in for the parts inside multipart headings, and are thus meaningless in
multipart headings. multipart headings.
These two headers may occur both inside and outside of a These two headers may occur both inside and outside of a
Multipart/Related part. multipart/related part.
4.2 The Content-Base header 4.2 The Content-Base header
The Content-Base gives a base for relative URIs occurring in other The Content-Base gives a base for relative URIs occurring in other
heading fields and in content which do not have any BASE element in its heading fields and in content which do not have any BASE element in its
HTML code. Its value MUST be an absolute URI. HTML code. Its value MUST be an absolute URI.
Example showing which Content-Base is valid where: Example showing which Content-Base is valid where:
Content-Type: Multipart/related; boundary="boundary-example-1"; Content-Type: Multipart/related; boundary="boundary-example-1";
skipping to change at page 8, line 5 skipping to change at page 9, line 5
URIs. For example, HTML provides the BASE element for this. URIs. For example, HTML provides the BASE element for this.
(b) There is a Content-Base header (as defined in section 4.2), (b) There is a Content-Base header (as defined in section 4.2),
specifying the base to be used. specifying the base to be used.
(c) There is a Content-Location header in the heading of the body (c) There is a Content-Location header in the heading of the body
part which can then serve as the base in the same way as the part which can then serve as the base in the same way as the
request URI can serve as a base for relative URIs within a file request URI can serve as a base for relative URIs within a file
retrieved via HTTP [HTTP]. retrieved via HTTP [HTTP].
6. Sending HTML documents without linked objects 6. Sending documents without linked objects
If an HTML document is sent without other objects, to which it is If a document, such as an HTML object, is sent without other objects, to
linked, it MAY be sent as a Text/HTML body part by itself. In this case, which it is linked, it MAY be sent as a Text/HTML body part by itself.
Multipart/related need not be used. In this case, multipart/related need not be used.
Such a document may either not include any links, or contain links which Such a document may either not include any links, or contain links which
the recipient resolves via ordinary net look up, or contain links which the recipient resolves via ordinary net look up, or contain links which
the recipient cannot resolve. the recipient cannot resolve.
Inclusion of links which the recipient has to look up through the net Inclusion of links which the recipient has to look up through the net
may not work for some recipients, since all e-mail recipients do not may not work for some recipients, since all e-mail recipients do not
have full internet connectivity. Also, such links may work for the have full internet connectivity. Also, such links may work for the
sender but not for the recipient, for example when the link refers to an sender but not for the recipient, for example when the link refers to an
URL within a company-internal network not accessible from outside the URI within a company-internal network not accessible from outside the
company. company.
Note that documents with links that the recipient cannot resolve MAY be Note that documents with links that the recipient cannot resolve MAY be
sent, although this is discouraged. For example, two persons developing sent, although this is discouraged. For example, two persons developing
a new HTML page may exchange incomplete versions. a new HTML page may exchange incomplete versions.
7. Use of the Content-Type: Multipart/related 7. Use of the Content-Type: Multipart/related
The use of URI references creates some additional issues for aggregate If a message contains one or more MIME body parts containing links and
HTML objects. Normal URI references can of course be used, however it is also contains as separate body parts, data, to which these links (as
likely that many user agents may not be able to retrieve those objects defined, for example, in RFC 1866 [HTML2]) refers, then this whole set
referred to. This document provides a means for these additional objects of body parts (referring body parts and referred-to body parts) SHOULD
to be transmitted with the HTML and for the links between these objects be sent within a multipart/related body part as defined in [REL].
to be properly resolved.
If a message contains one or more Text/HTML body parts and also contains
as separate body parts, data, to which hyperlinks (as defined in RFC
1866 [HTML2]) in the Text/HTML body parts refers, then this set of
objects SHOULD be sent within a Multipart/Related body part as defined
in [REL].
The root of the Multipart/related SHOULD be of the Content-Type: The root body part of the multipart/related SHOULD be the start object
Text/HTML. Use of the Content-Type: Multipart/Alternative, one of whose for rendering the object, such as a text/html object, and which contains
parts is of Content-Type: Text/HTML, is also allowed, but implementors links to objects in other body parts, or a multipart/alternative of
are warned that many mail programs treat Multipart/Alternative as if it which at least one alternative resolves to such a start object.
had been Multipart/Mixed (even though MIME [MIME1] requires support for Implementors are warned, however, that many mail programs treat
Multipart/Alternative). multipart/alternative as if it had been multipart/mixed (even though
MIME [MIME1] requires support for multipart/alternative).
If the root is not the first body part within the Multipart/related, its [REL] requires that the type attribute of the "Content-Type:
Content-ID MUST be given in a start parameter to the Content-Type: multipart/related" statement be the type of the root object, and this
Multipart/Related header. value can thus be "multipart/alternative". If the root is not the first
body part within the multipart/related, [REL] further requires that its
Content-ID MUST be given in a start parameter to the "Content-Type:
multipart/related" header.
When presenting the root body part to the user, the additional body When presenting the root body part to the user, the additional body
parts within the Multipart/related can be used: parts within the multipart/related can be used:
(a) For those recipients who only have e-mail but not full Internet (a) For those recipients who only have e-mail but not full Internet
access. access.
(b) For those recipients who for other reasons, such as firewalls (b) For those recipients who for other reasons, such as firewalls
or the use of company-internal links, cannot retrieve the or the use of company-internal links, cannot retrieve the
linked body parts through the net. linked body parts through the net.
Note that this means that you can, via e-mail, send HTML which Note that this means that you can, via e-mail, send HTML which
includes URIs which the recipient cannot resolve via HTTPor includes URIs which the recipient cannot resolve via HTTPor
other connectivity-requiring URIs. other connectivity-requiring URIs.
(c) For items which are not available on the web. (c) For items which are not available on the web.
(d) For any recipient to speed up access. (d) For any recipient to speed up access.
The type parameter of the Content-Type: Multipart/related MUST be the The type parameter of the Content-Type: multipart/related MUST be the
same as the Content-Type of its root. same as the Content-Type of its root.
When a sending MUA sends objects which were retrieved from the WWW, it When a sending MUA sends objects which were retrieved from the WWW, it
SHOULD maintain their WWW URIs. It SHOULD not transform these URIs into SHOULD maintain their WWW URIs. It SHOULD not transform these URIs into
some other URI form prior to transmitting them. This will allow the some other URI form prior to transmitting them. This will allow the
receiving MUA to both verify MICs included with the email message, as receiving MUA to both verify MICs included with the email message, as
well as verify the documents against their WWW counterpoints. well as verify the documents against their WWW counterpoints.
The Text/HTML body MAY contain links to MIME body parts outside of the This standard does not cover the case where a multipart/related contains
Multipart/Related or in other messages, but such usage is discouraged. links to MIME body parts outside of the current multipart/related or in
Implementors are warned that many receiving mailers may not be able to other MIME messages, even if methods similar to those described in this
resolve such links. standard are used. Implementors who provide such links are warned that
mailers implementing this standard may not be able to resolve such
links.
Within such a Multipart/related, ALL different parts MUST have different Within such a multipart/related, ALL different parts MUST have different
Content-Location or Content-ID values. Content-Location or Content-ID values.
8. Format of Links to Other Body Parts 8. Format of Links to Other Body Parts
8.1 General principle 8.1 General principle
A Text/HTML body part may contain hyperlinks to objects which are A body part, such as a text/HTML body part, may contain hyperlinks to
included as other body parts in the same message and within the same objects which are included as other body parts in the same message and
multipart/related content. Often such linked objects are meant to be within the same multipart/related content. Often such linked objects are
displayed inline to the reader of the main document. HTML version 2.0 meant to be displayed inline to the reader of the main document; for
[RFC 1866=HTML2] has only one way of specifying hyperlinks to such example, objects referenced with the IMG tag in HTML [RFC 1866=HTML2].
inline embedded content, the IMG tag. New tags with this property are New tags with this property are proposed in the ongoing development of
however proposed in the ongoing development of HTML (example: applet, HTML (example: applet, frame).
frame).
In order to send such messages, there is a need to indicate which other In order to send such messages, there is a need to indicate which other
body parts are referred to by the links in the Text/HTML body parts. body parts are referred to by the links in the body parts containing
This is done in the following way: For each distinct URI in the such links. For example, a body part of Content-Type: Text/HTML often
Text/HTML document, which refers to data which is sent in the same MIME has links to other objects, which might be included in other body parts
message, there SHOULD be a separate body part within the in the same MIME message. The referencing of other body parts is done in
the following way: For each body part containing links and each distinct
URI within it, which refers to data which is sent in the same MIME
message, there SHOULD be a separate body part within the current
multipart/related part of the message containing this data. Each such multipart/related part of the message containing this data. Each such
body part SHOULD contain a Content-Location header (see section 8.2) or body part SHOULD contain a Content-Location header (see section 8.2) or
a Content-ID header (see section 8.3). a Content-ID header (see section 8.3).
An e-mail system which claims conformance to this standard MUST support An e-mail system which claims conformance to this standard MUST support
receipt of Multipart/related (as defined in section 7) with links receipt of multipart/related (as defined in section 7) with links
between body parts using both the Content-Location (as defined in between body parts using both the Content-Location (as defined in
section 8.2) and the Content-ID method (as defined in section 8.3). section 8.2) and the Content-ID method (as defined in section 8.3).
8.2 Use of the Content-Location header 8.2 Use of the Content-Location header
If there is a Content-Base header, then the recipient MUST employ If there is a Content-Base header, then the recipient MUST employ
relative to absolute resolution as defined in RFC 1808 [RELURL] of URIs relative to absolute resolution as defined in RFC 1808 [RELURL] of
in both the HTML markup and the Content-Location header before matching relative URIs in both the HTML markup and the Content-Location header
a hyperlink in the HTML markup to a Content-Location header. The same before matching a hyperlink in the HTML markup to a Content-Location
applies if the Content-Location contains an absolute URL, and the HTML header. The same applies if the Content-Location contains an absolute
markup contains a BASE element so that relative URL-s in the HTML markup URI, and the HTML markup contains a BASE element so that relative URIs
can be resolved. in the HTML markup can be resolved.
If there is NO Content-Base header, and the Content-Location header If there is NO Content-Base header, and the Content-Location header
contains a relative URL, then NO relative to absolute resolution SHOULD contains a relative URI, then NO relative to absolute resolution SHOULD
be performed (even if there is a BASE element in the HTML markup), and be performed (even if there is a BASE specification, such as the BASE
exact textual match of the relative URL-s in the Content-Location and element in HTML, in the body part containing the URI), and exact textual
the HTML markup is performed instead (after removal of LWSP introduced match of the relative URI-s in the Content-Location and the HTML markup
as described in section 4.4 above). is performed instead (after removal of LWSP introduced as described in
section 4.4 above).
The URI in the Content-Location header need not refer to an object which The URI in the Content-Location header need not refer to an object which
is actually available globally for retrieval using this URI (after is actually available globally for retrieval using this URI (after
resolution of relative URIs). resolution of relative URIs). However, URI-s in Content-Location headers
(if absolute, or resolvable to absolute URIs) SHOULD still be globally
unique.
8.3 Use of the Content-ID header and CID URLs 8.3 Use of the Content-ID header and CID URLs
When CID (Content-ID) URLs as defined in RFC 1738 [URL] and RFC 1873 When CID (Content-ID) URLs as defined in RFC 1738 [URL] and RFC 1873
[MIDCID] is used for links between body parts, the Content-Location [MIDCID] is used for links between body parts, the Content-Location
statement will normally be replaced by a Content-ID header. Thus, the statement will normally be replaced by a Content-ID header. Thus, the
following two headers are identical in meaning: following two headers are identical in meaning:
Content-ID: foo@bar.net Content-ID: foo@bar.net
Content-Location: CID: foo@bar.net Content-Location: CID: foo@bar.net
Note: Content-IDs MUST be globally unique [MIME1]. It is thus not Note: Content-IDs MUST be globally unique [MIME1]. It is thus not
permitted to make them unique only within this message or within this permitted to make them unique only within this message or within this
multipart/related. multipart/related.
9 Examples 9 Examples
9.1 Example of a HTML body without included linked objects 9.1 Example of a HTML body without included linked objects
The first example is the simplest form of an HTML email message. This is The first example is the simplest form of an HTML email message. This is
not an aggregate HTML object, but simply one by itself. This message not an aggregate HTML object, but simply a message with a single HTML
contains a hyperlink but does not provide the ability to resolve the body part. This message contains a hyperlink but does not provide the
hyperlink. To resolve the hyperlink the receiving client would need ability to resolve the hyperlink. To resolve the hyperlink the receiving
either IP access to the Internet, or an electronic mail web gateway. client would need either IP access to the Internet, or an electronic
mail web gateway.
From: foo1@bar.net From: foo1@bar.net
To: foo2@bar.net To: foo2@bar.net
Subject: A simple example Subject: A simple example
Mime-Version: 1.0 Mime-Version: 1.0
Content-Type: Text/HTML; charset=US-ASCII Content-Type: Text/HTML; charset=US-ASCII
<HTML> <HTML>
<head></head> <head></head>
<body> <body>
<h1>Hi there!</h1> <h1>Hi there!</h1>
An example of an HTML message.<p> An example of an HTML message.<p>
Try clicking <a href="http://www.resnova.com/">here.</a><p> Try clicking <a href="http://www.resnova.com/">here.</a><p>
</body></HTML> </body></HTML>
9.2 Example with absolute URIs to an embedded GIF picture: 9.2 Example with absolute URIs to an embedded GIF picture:
skipping to change at page 11, line 18 skipping to change at page 12, line 36
An example of an HTML message.<p> An example of an HTML message.<p>
Try clicking <a href="http://www.resnova.com/">here.</a><p> Try clicking <a href="http://www.resnova.com/">here.</a><p>
</body></HTML> </body></HTML>
9.2 Example with absolute URIs to an embedded GIF picture: 9.2 Example with absolute URIs to an embedded GIF picture:
From: foo1@bar.net From: foo1@bar.net
To: foo2@bar.net To: foo2@bar.net
Subject: A simple example Subject: A simple example
Mime-Version: 1.0 Mime-Version: 1.0
Content-Type: Multipart/related; boundary="boundary-example-1"; Content-Type: multipart/related; boundary="boundary-example-1";
type=Text/HTML; start=foo3*foo1@bar.net type=Text/HTML; start=foo3*foo1@bar.net
--boundary-example 1 --boundary-example 1
Content-Type: Text/HTML;charset=US-ASCII Content-Type: Text/HTML;charset=US-ASCII
Content-ID: foo3*foo1@bar.net Content-ID: foo3*foo1@bar.net
... text of the HTML document, which might contain a hyperlink ... text of the HTML document, which might contain a hyperlink
to the other body part, for example through a statement such as: to the other body part, for example through a statement such as:
<IMG SRC="http://www.ietf.cnri.reston.va.us/images/ietflogo.gif" <IMG SRC="http://www.ietf.cnri.reston.va.us/images/ietflogo.gif"
ALT="IETF logo"> ALT="IETF logo">
skipping to change at page 11, line 49 skipping to change at page 13, line 14
--boundary-example-1-- --boundary-example-1--
9.3 Example with relative URIs to an embedded GIF picture 9.3 Example with relative URIs to an embedded GIF picture
From: foo1@bar.net From: foo1@bar.net
To: foo2@bar.net To: foo2@bar.net
Subject: A simple example Subject: A simple example
Mime-Version: 1.0 Mime-Version: 1.0
Content-Base: "http://www.ietf.cnri.reston.va.us" Content-Base: "http://www.ietf.cnri.reston.va.us"
Content-Type: Multipart/related; boundary="boundary-example-1"; Content-Type: multipart/related; boundary="boundary-example-1";
type=Text/HTML type=Text/HTML
--boundary-example 1 --boundary-example 1
Content-Type: Text/HTML; charset=ISO-8859-1 Content-Type: Text/HTML; charset=ISO-8859-1
Content-Transfer-Encoding: QUOTED-PRINTABLE Content-Transfer-Encoding: QUOTED-PRINTABLE
... text of the HTML document, which might contain a hyperlink ... text of the HTML document, which might contain a hyperlink
to the other body part, for example through a statement such as: to the other body part, for example through a statement such as:
<IMG SRC="/images/ietflogo.gif" ALT="IETF logo"> <IMG SRC="/images/ietflogo.gif" ALT="IETF logo">
Example of a copyright sign encoded with Quoted-Printable: =A9 Example of a copyright sign encoded with Quoted-Printable: =A9
Example of a copyright sign mapped onto HTML markup: &#168; Example of a copyright sign mapped onto HTML markup: & 168;
--boundary-example-1 --boundary-example-1
Content-Location: "/images/ietflogo.gif" Content-Location: "/images/ietflogo.gif"
Content-Type: IMAGE/GIF Content-Type: IMAGE/GIF
Content-Transfer-Encoding: BASE64 Content-Transfer-Encoding: BASE64
R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5 R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5
NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A
etc... etc...
--boundary-example-1-- --boundary-example-1--
9.4 Example using CID URL and Content-ID header to an embedded GIF 9.4 Example using CID URL and Content-ID header to an embedded GIF
picture picture
From: foo1@bar.net From: foo1@bar.net
To: foo2@bar.net To: foo2@bar.net
Subject: A simple example Subject: A simple example
Mime-Version: 1.0 Mime-Version: 1.0
Content-Type: Multipart/related; boundary="boundary-example-1"; Content-Type: multipart/related; boundary="boundary-example-1";
type=Text/HTML type=Text/HTML
--boundary-example 1 --boundary-example 1
Content-Type: Text/HTML; charset=US-ASCII Content-Type: Text/HTML; charset=US-ASCII
... text of the HTML document, which might contain a hyperlink ... text of the HTML document, which might contain a hyperlink
to the other body part, for example through a statement such as: to the other body part, for example through a statement such as:
<IMG SRC="cid:foo4*foo1@bar.net" ALT="IETF logo"> <IMG SRC="cid:foo4*foo1@bar.net" ALT="IETF logo">
--boundary-example-1 --boundary-example-1
skipping to change at page 12, line 42 skipping to change at page 14, line 4
Content-Type: Text/HTML; charset=US-ASCII Content-Type: Text/HTML; charset=US-ASCII
... text of the HTML document, which might contain a hyperlink ... text of the HTML document, which might contain a hyperlink
to the other body part, for example through a statement such as: to the other body part, for example through a statement such as:
<IMG SRC="cid:foo4*foo1@bar.net" ALT="IETF logo"> <IMG SRC="cid:foo4*foo1@bar.net" ALT="IETF logo">
--boundary-example-1 --boundary-example-1
Content-ID: foo4*foo1@bar.net Content-ID: foo4*foo1@bar.net
Content-Type: IMAGE/GIF Content-Type: IMAGE/GIF
Content-Transfer-Encoding: BASE64 Content-Transfer-Encoding: BASE64
R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5 R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5
NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A
etc... etc...
--boundary-example-1-- --boundary-example-1--
10. Content-Disposition header 10. Content-Disposition header
Note the specification in [REL] on the relations between Content- Note the specification in [REL] on the relations between Content-
Disposition and Multipart/Related. Disposition and multipart/related.
11. Encoding Considerations for HTML bodies 11. Character encoding issues
11.1 Character set issues 11.1 Character set issues
A mail user agent that is composing a message using HTML has a choice in For the encoding of characters in an HTML document into a MIME-
how to represent and subsequently encode characters for the transmission compatible octet stream, the following three mechanisms are relevant:
of the mail message.
However, there are some differences as to the default character
encoding, specified by the MIME "charset" parameter. If this parameter
is omitted: When transferred through HTTP, the default is [HTTP]:
content-type: Text/HTML; charset=ISO-8859-1
When transferred via e-mail, the default is [MIME1]:
content-type: Text/HTML; charset=US-ASCII
To avoid confusion, the MIME Content-Type parameter for Text/HTML SHOULD
always include a charset value, and not rely on the MIME e-mail default
of US-ASCII if no charset value is specified.
When sending HTML via MIME e-mail, three layers of encoding are relevant
as shown in Figure 1:
Displayed text Displayed text
| ^
V |
+-------------+ +----------------+
| HTML editor | | HTML viewer |
| | | or Web browser |
+-------------+ +----------------+
| ^
V |
HTML markup HTML markup
| ^
V |
+------------------+ +-------------------+
| MIME content- | | MIME content- |
| transfer-encoder | | transfer-encoder |
+------------------+ +-------------------+
| ^
V +-----------+ |
transfer-encoding--->| Transport |-->transfer encoding
+-----------+
Figure 1
Definitions (see Figure 1):
Displayed text A visual representation of the intended text.
HTML markup A sequence of characters formatted according to the
HTML specification [HTML2].
MIME encoding A sequence of octets physically forwarded via e-mail,
may include MIME content-transfer-encoding as specified
in [MIME1].
HTML editor Software used to produce HTML markup.
MIME content- Software used to encode and decode non-US-ASCII
transfer-encoder characters according to the MIME standard.
HTML viewer Software used to display HTML documents to recipients.
Note: Real implementations need not split functions into different - HTML [HTML2] as an application of SGML [SGML] allows characters to be
modules as described above. The figure above is a logical model in order denoted by character entities as well as by numeric character
to explain how rewriting and transport is done. references
(e.g. "latin small letter a with acute" may be represented by
"&aacute;"
or "& 225;") in the HTML markup.
If the displayed text contains non-US-ASCII characters, these characters - HTML documents, in common with other documents of the MIME content-
might have to be rewritten if the transport (as is common in e-mail) is type
set to handle only 7-bit characters. text, can use various kinds of character encodings which are indicated
by the value of the "charset" parameter in the MIME content-type
header.
For the exact meaning and use of the "charset" parameter, please see
[MIME1 section 7.1.1]. Note that the "charset" parameter refers to the
charset in the HTML markup, not to the charset in the displayed text.
Thus, if the HTML markup contains only US-ASCII characters, then the
value of the charset parameter should be US-ASCII, even if the HTML
markup contains entities which cause the displayed text to show
non-US-ASCII-characters.
HTML markup allows some characters at the displayed text level to be - Any documents including HTML documents that contain octet values
represented using either entity references or numeric character outside
references (as defined in [HTML2] section 3.2.1). For example, a "small the 7-bit range or that contain bare CRs or bare LFs need a content-
a, acute accent" may be represented by the entity reference "&aacute;" transfer-encoding applied before transmission over certain transport
or the numeric character reference "&#255;". Alternatively, the same protocols [MIME1, chapter 5].
character might appear directly in the HTML document, but for
transmission through MIME 7-bit-systems, the entire HTML document is
encoded using a Content-Transfer-Encoding (as defined in [MIME1] section
5).
In sending a message containing non US-ASCII characters, both these The above three mechanisms are well defined and documented, and
rewriting methods MAY be used, and any mixture of them MAY occur when therefore not further explained here. In sending a message, all the
sending the document via e-mail. Receiving mailers (together with the abovementioned mechanisms MAY be used, and any mixture of them MAY occur
Web browser they may use to display the document) MUST be capable of when sending the document via e-mail. Receiving mailers (together with
handling any combinations of these rewriting methods. the Web browser they may use to display the document) MUST be capable of
handling any combinations of these mechanisms.
The value of the charset attribute of the Content-Type header field Some transport mechanisms may specify a default "charset" parameter if
should be US-ASCII if and only if the HTML markup contains only US-ASCII none is suppled [HTTP, MIME1]. Because the default differs for different
characters (even if the displayed text contains non-US-ASCII mechanisms, when HTML is transferred through mail, the charset parameter
characters). SHOULD be included, rather than relying on the default.
Example of non-US-ASCII characters in HTML: See section 9.3 above. Example of non-US-ASCII characters in HTML: See section 9.3 above.
11.2 Line break characters 11.2 Line break characters
The MIME standard [MIME1] specifies that line breaks in the MIME The MIME standard [MIME1] specifies that line breaks in the MIME content
encoding (see figure 1) MUST be CRLF. The HTTP standard [HTTP] specifies MUST be CRLF. The HTTP standard [HTTP] specifies that line breaks in
that line breaks in transported HTML markup (see figure 2) may be either transported HTML markup may be either bare CRs, bare LFs or CRLFs. To
bare CRs, bare LFs or CRLFs. To allow data integrity checks through allow data integrity checks through checksums, MIME content-transfer-
checksums, MIME encoding of line breaks SHOULD be such that after encoding of line breaks SHOULD, if necessary, be used so that after
decoding, the line break representation of the original HTML markup is decoding, the line break representation of the original HTML markup is
returned. returned.
Note that since the mail content-MD5 is defined to a canonical form with Note that since the mail content-MD5 is defined to a canonical form with
all line breaks converted to CRLF, while the HTTP content-MD5 is defined all line breaks converted to CRLF, while the HTTP content-MD5 is defined
to apply to the transmitted form. This means that the Content-MD5 HTTP to apply to the transmitted form. This means that the Content-MD5 HTTP
header may not be correct for Text/HTML that is retrieved from a HTTP header may not be correct for Text/HTML that is retrieved from a HTTP
server and then sent via mail. server and then sent via mail.
12. Security Considerations 12. Security Considerations
skipping to change at page 15, line 36 skipping to change at page 15, line 53
documents containing such URLs. documents containing such URLs.
One way of implementing messages with linked body parts is to handle the One way of implementing messages with linked body parts is to handle the
linked body parts in a combined mail and WWW proxy server. The mail linked body parts in a combined mail and WWW proxy server. The mail
client is only given the start body part, which it passes to a web client is only given the start body part, which it passes to a web
browser. This web browser requests the linked parts from the proxy browser. This web browser requests the linked parts from the proxy
server. If this method is used, and if the combined server is used by server. If this method is used, and if the combined server is used by
more than one user, then methods must be employed to ensure that body more than one user, then methods must be employed to ensure that body
parts of a message to one person is not retrievable by another person. parts of a message to one person is not retrievable by another person.
Use of passwords (also known as tickets or magic cookies) is one way of Use of passwords (also known as tickets or magic cookies) is one way of
achieving this. Note that some caching HTML proxy servers may not achieving this. Note that some caching WWW proxy servers may not
distinguish between cached objects from e-mail and HTTP, which may be a distinguish between cached objects from e-mail and HTTP, which may be a
security risk. security risk.
In addition, by allowing people to mail aggregate HTML objects, we are In addition, by allowing people to mail aggregate objects, we are
opening the door to other potential security problems that until now opening the door to other potential security problems that until now
were only problems for WWW users. For example, some HTML documents now were only problems for WWW users. For example, some HTML documents now
either themselves contain executable content (JavaScript) or contain either themselves contain executable content (JavaScript) or contain
links to executable content (The "INSERT" specification, Java). It would links to executable content (The "INSERT" specification, Java). It would
be exceedingly dangerous for a receiving User Agent to execute content be exceedingly dangerous for a receiving User Agent to execute content
received through a mail message without careful attention to received through a mail message without careful attention to
restrictions on the capabilities of that executable content. restrictions on the capabilities of that executable content.
13. Acknowledgments 13. Acknowledgments
Harald T. Alvestrand, Richard Baker, Dave Crocker, Martin J. Duerst, Harald T. Alvestrand, Richard Baker, Dave Crocker, Martin J. Duerst,
Lewis Geer, Roy Fielding, Al Gilman, Paul Hoffman, Richard W. Jesmajian, Lewis Geer, Roy Fielding, Al Gilman, Paul Hoffman, Richard W. Jesmajian,
Mark K. Joseph, Greg Herlihy, Valdis Kletnieks, Daniel LaLiberte, Ed Mark K. Joseph, Greg Herlihy, Valdis Kletnieks, Daniel LaLiberte, Ed
Levinson, Jay Levitt, Albert Lunde, Larry Masinter, Keith Moore, Gavin Levinson, Jay Levitt, Albert Lunde, Larry Masinter, Keith Moore, Gavin
Nicol, Pete Resnick, Jon Smirl, Einar Stefferud, Jamie Zawinski and Nicol, Pete Resnick, Jon Smirl, Einar Stefferud, Jamie Zawinski, Steve
several other people have helped us with preparing this document. I Zilles and several other people have helped us with preparing this
alone take responsibility for any errors which may still be in the document. I alone take responsibility for any errors which may still be
document. in the document.
14. References 14. References
Ref. Author, title Ref. Author, title
--------- -------------------------------------------------------- --------- --------------------------------------------------------
[CONDISP] R. Troost, S. Dorner: "Communicating Presentation [CONDISP] R. Troost, S. Dorner: "Communicating Presentation
Information in Internet Messages: The Content- Information in Internet Messages: The Content-
Disposition Header", RFC 1806, June 1995. Disposition Header", RFC 1806, June 1995.
skipping to change at page 16, line 49 skipping to change at page 17, line 5
Describing the Format of Internet Message Bodies", RFC Describing the Format of Internet Message Bodies", RFC
1521, Sept 1993. 1521, Sept 1993.
[MIME2] N. Borenstein & N. Freed: "Multipurpose Internet Mail [MIME2] N. Borenstein & N. Freed: "Multipurpose Internet Mail
Extensions (MIME) Part Two: Media Types". draft-ietf- Extensions (MIME) Part Two: Media Types". draft-ietf-
822ext-mime-imt-02.txt, December 1995. 822ext-mime-imt-02.txt, December 1995.
[NEWS] M.R. Horton, R. Adams: "Standard for interchange of [NEWS] M.R. Horton, R. Adams: "Standard for interchange of
USENET messages", RFC 1036, December 1987. USENET messages", RFC 1036, December 1987.
[PDF] Bienz, T., Cohn, R. and Meehan, J.: "Portable Document
Format Reference Manual, Version 1.1", Adboe Systems
Inc.
[REL] Harald Tveit Alvestrand, Edward Levinson: "The MIME [REL] Harald Tveit Alvestrand, Edward Levinson: "The MIME
Multipart/Related Content-type", <draft-levinson- Multipart/Related Content-type", <draft-levinson-
multipart-related-00.txt>, January 1995. multipart-related-00.txt>, January 1995.
[RELURL] R. Fielding: "Relative Uniform Resource Locators", RFC [RELURL] R. Fielding: "Relative Uniform Resource Locators", RFC
1808, June 1995. 1808, June 1995.
[RFC822] D. Crocker: "Standard for the format of ARPA Internet [RFC822] D. Crocker: "Standard for the format of ARPA Internet
text messages." STD 11, RFC 822, August 1982. text messages." STD 11, RFC 822, August 1982.
[SGML] ISO 8879. Information Processing -- Text and Office -
Standard Generalized Markup Language (SGML),
1986. <URL:http://www.iso.ch/cate/d16387.html>
[SMTP] J. Postel: "Simple Mail Transfer Protocol", STD 10, RFC [SMTP] J. Postel: "Simple Mail Transfer Protocol", STD 10, RFC
821, August 1982. 821, August 1982.
[URL] T. Berners-Lee, L. Masinter, M. McCahill: "Uniform [URL] T. Berners-Lee, L. Masinter, M. McCahill: "Uniform
Resource Locators (URL)", RFC 1738, December 1994. Resource Locators (URL)", RFC 1738, December 1994.
[URLBODY] N. Freed and Keith Moore: "Definition of the URL MIME [URLBODY] N. Freed and Keith Moore: "Definition of the URL MIME
External-Body Access-Type", draft-ietf-mailext-acc-url- External-Body Access-Type", draft-ietf-mailext-acc-url-
01.txt, November 1995. 01.txt, November 1995.
skipping to change at page 17, line 31 skipping to change at page 17, line 45
Alex Hopmann. Alex Hopmann.
Jacob Palme Phone: +46-8-16 16 67 Jacob Palme Phone: +46-8-16 16 67
Stockholm University and KTH Fax: +46-8-783 08 29 Stockholm University and KTH Fax: +46-8-783 08 29
Electrum 230 E-mail: jpalme@dsv.su.se Electrum 230 E-mail: jpalme@dsv.su.se
S-164 40 Kista, Sweden S-164 40 Kista, Sweden
Alex Hopmann Alex Hopmann
President President
ResNova Software, Inc. E-mail: alex.hopmann@resnova.com ResNova Software, Inc. E-mail: alex.hopmann@resnova.com
5011 Argosy Dr. #13 5011 Argosy Dr. 13
Huntington Beach, CA 92649 Huntington Beach, CA 92649
Working group chairman: Working group chairman: Einar Stefferud <stef@nma.com>
Einar Stefferud <stef@nma.com>
 End of changes. 62 change blocks. 
201 lines changed or deleted 194 lines changed or added

This html diff was produced by rfcdiff 1.34. The latest version is available from http://tools.ietf.org/tools/rfcdiff/