[Docs] [txt|pdf] [Tracker] [Email] [Nits]

Versions: 00 01 02 03 draft-ietf-appsawg-malformed-mail

Individual submission                                       M. Kucherawy
Internet-Draft                                           Cloudmark, Inc.
Intended status: BCP                                    November 7, 2010
Expires: May 11, 2011

         Best Current Practices for Handling Malformed Messages


   The email ecosystem has long had a very permissive set of common
   processing rules in place despite increasingly rigid standards
   governing its components, ostensibly to improve the user experience.
   These come at some cost, and various components are faced with
   decisions about whether or not to permit non-conforming messages to
   continue toward their destinations unaltered, adjust them to conform
   (possibly at the cost of losing some of the original message), or
   outright rejecting them.

   This memo includes a collection of the best current practices in a
   variety of such situations, to be used as implementation guidance.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on May 11, 2011.

Copyright Notice

   Copyright (c) 2010 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of

Kucherawy                 Expires May 11, 2011                  [Page 1]

Internet-Draft             Mailformed Mail BCP             November 2010

   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
   2.  Keywords  . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
   3.  Background  . . . . . . . . . . . . . . . . . . . . . . . . . . 3
   4.  Internal Representations  . . . . . . . . . . . . . . . . . . . 3
   5.  Header Anomalies  . . . . . . . . . . . . . . . . . . . . . . . 4
     5.1.  Non-Header Lines  . . . . . . . . . . . . . . . . . . . . . 4
     5.2.  Header Malformations  . . . . . . . . . . . . . . . . . . . 5
     5.3.  Header Field Counts . . . . . . . . . . . . . . . . . . . . 6
   6.  MIME Anomalies  . . . . . . . . . . . . . . . . . . . . . . . . 7
     6.1.  Missing MIME-Version Field  . . . . . . . . . . . . . . . . 7
   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7
   8.  Security Considerations . . . . . . . . . . . . . . . . . . . . 7
   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . . . 8
     9.1.  Normative References  . . . . . . . . . . . . . . . . . . . 8
     9.2.  Informative References  . . . . . . . . . . . . . . . . . . 8
   Appendix A.  Examples . . . . . . . . . . . . . . . . . . . . . . . 8
   Appendix B.  Acknowledgements . . . . . . . . . . . . . . . . . . . 8

Kucherawy                 Expires May 11, 2011                  [Page 2]

Internet-Draft             Mailformed Mail BCP             November 2010

1.  Introduction

   The history of email standards, going back to [RFC822] and beyond,
   contains a fairly rigid evolution of specifications.  But
   implementations within that culture have also long had an
   undercurrent known formally as the robustness principle, but also
   known informally as Postel's Law: "Be conservative in what you do, be
   liberal in what you accept from others."

   In general, this served the email ecosystem well by allowing a few
   errors in implementations without obstructing participation in the
   game.  The proverbial bar was set low.  However, as we have evolved
   into the current era, some of these lenient stances have begun to
   expose opportunities that can be exploited by malefactors.  Various
   email-based applications rely on strong application of these
   standards for simple security checks, while the very basic building
   blocks of that infrastructure, intending to be robust, fail to assert
   those standards.

   This memo presents some areas in which the more lenient stances can
   provide vectors for attack, and then presents the collected wisdom of
   numerous applications in and around the email ecosystem for dealing
   with them to mitigate their impact.

2.  Keywords

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   document are to be interpreted as described in [KEYWORDS].

3.  Background

   The reader would benefit from reading [EMAIL-ARCH] for some general
   background about the overall email architecture.  Of particular
   interest is the Internet Message Format, detailed in [MAIL].
   Throughout this document, the use of the term "messsage" should be
   assumed to mean a block of text conforming to the Internet Message

4.  Internal Representations

   Any agent handling a message could have one or two distinct
   representations of a message it is handling.  One is an internal
   representation, such as a block of storage used for the header and a
   block for the body.  These may be sorted, encoded, decoded, etc. as
   per the needs of that particular module.  The other is the
   representation that is output to the next agent in the handling
   chain.  This might be identical to the version that is input to the

Kucherawy                 Expires May 11, 2011                  [Page 3]

Internet-Draft             Mailformed Mail BCP             November 2010

   module, or it might have some changes such as added or reordered
   header fields, body modifications to remove malicious content, etc.

   In some cases, advice is provided only for internal representations.
   However, there is often occasion to mandate changes to the output as

5.  Header Anomalies

   This section covers common syntactical and semantic anomalies found
   in headers of messages, and presents preferred mitigations.

5.1.  Non-Header Lines

   It has been observed that some messages contain a line of text in the
   header that is not a valid message header field of any kind.  For

       From: user@example.com
       To: userpal@example.net
       Subject: This is your reminder
       about the football game tonight
       Date: Wed, 20 Oct 2010 20:53:35 -0400

       Don't forget to meet us for the tailgate party!

   The cause of this is generally a bug in a message generator of some
   kind.  If the fourth line was intended to be a continuation of the
   third, it should be indented by whitespace as set out in Section
   2.2.3 of [MAIL].

   This anomaly has varying impacts on processing software, depending on
   the implementation:

   1.  some agents choose to separate the header of the message from the
       body only at the first empty line (i.e. a CRLF immediately
       followed by another CRLF);

   2.  some agents assume this anomaly should be interpreted to mean the
       body starts at line four, as the end of the header is assumed by
       encountering something that is not a valid header field or folded
       portion thereof;

   3.  some agents assume this should be interpreted as an intended
       header folding as described above;

   4.  some agents reject this outright as line four is neither a valid
       header field nor a folded continuation of a header field prior to

Kucherawy                 Expires May 11, 2011                  [Page 4]

Internet-Draft             Mailformed Mail BCP             November 2010

       an empty line.

   This can be exploited if it is known that one message handling agent
   will take one action while the next agent in the handling chain will
   take another.  For example, a filter trained to detect malicious body
   anomalies (e.g. references to dangerous web sites) that is fed by a
   Mail Transfer Agent (MTA) implementing (1) above might not get the
   opportunity to identify something dangerous in a message if it is
   unaware of the anomaly and does not itself check for it.

   Consensus indicates the preferred implementation is to terminate
   header processing before the first character in line four, as
   described in (2) above.  Thus, a module compliant with this
   specification MUST terminate header processing upon encountering the
   first line of text that is not a valid header field.  That is, all
   data after that point in the input MUST NOT be considered part of the
   header of the message.  If that line is not an empty line, an empty
   line MUST be inserted at that point in the emitted version of the
   message being processed.

   It should be noted that a few implementations make choice (4) above
   since any reputable message generation program will get header
   folding right, and thus anything so blatant as this malformation is
   likely an error caused by a malefactor.

5.2.  Header Malformations

   There are various malformations that exist.  A common one is
   insertion of whitespace at unusual locations, such as:

       From: user@example.com
       To: userpal@example.net
       Subject: This is your reminder
       MIME-Version : 1.0
       Content-Type: text/plain
       Date: Wed, 20 Oct 2010 20:53:35 -0400

       Don't forget to meet us for the tailgate party!

   Note the addition of whitespace in line four after the header field
   name but before the colon that separates the name from the value.

   The acceptance grammar of [MAIL] permits that extra whitespace, so it
   cannot be considered invalid.  However, a consensus of
   implementations prefers to remove that whitespace.  There is no
   perceived change to the semantics of the header field being altered
   as the whitespace is itself semantically meaningless.  Thus, a module
   compliant with this memo MUST remove all whitespace after the field

Kucherawy                 Expires May 11, 2011                  [Page 5]

Internet-Draft             Mailformed Mail BCP             November 2010

   name but before the colon, and MUST emit that version of that field
   on output.

5.3.  Header Field Counts

   Section 3.6 of [MAIL] prescribes specific header field counts for a
   valid message.  Few agents actually enforce these in the sense that a
   message whose header contents exceed one or more limits set there are
   generally allowed to pass; they may add any required fields that are
   missing, however.

   Also, few agents that use messages as input, including Mail User
   Agents (MUAs) that actually display messages to users, verify that
   the input is valid before proceeding.  Two popular open source
   filtering programs and two popular Mailing List Management (MLM)
   packages examined at the time this memo was drafted select either the
   first or last instance of a particular field name, such as From, to
   decide who sent a message.  Absent enforcement of [MAIL], an attacker
   can craft a message with multiple fields if that attacker knows the
   filter will make a decision based on one but the user will be shown
   the other.

   This situation is exacerbated when a claim of message validity is
   inferred by something like a valid [DKIM] signature.  Such a
   signature might cover one instance of a constrained field but not
   another, and a naive consumer of DKIM's output, not realizing which
   one was covered by a valid signature, presume the wrong one was the
   "good" one.  An MUA, for example could show the first of two From
   fields as "good" or "safe" while the DKIM signature actually only
   verified the second.

   Thus, an agent compliant with this specification MUST enact one of
   the following:

   1.  reject outright or refuse to process further any input message
       that does not conform to Section 3.6 of [MAIL];

   2.  remove or, in the case of an MUA, refuse to render any instances
       of a header field whose presence exceeds a limit prescribed in
       Section 3.6 of [MAIL] when generating its output;

   3.  alter the name of any header field whose presence exceeds a limit
       prescribed in Section 3.6 of [MAIL] when generating its outputso
       that later agents can produce a consistent result.

Kucherawy                 Expires May 11, 2011                  [Page 6]

Internet-Draft             Mailformed Mail BCP             November 2010

6.  MIME Anomalies

   [MIME], et seq, define a mechanism of message extensions for
   providing text in character sets other than ASCII, non-text
   attachments to messages, multi-part message bodies and similar

   Some anomalies with MIME-compliant generation are also common.  This
   section discusses some of those and presents preferred mitigations.

6.1.  Missing MIME-Version Field

   Any message that uses [MIME] constructs is required to have a MIME-
   Version header field.  Without them, the Content-Type and associated
   fields have no semantic meaning.

   It is often observed that a message has complete MIME structure, yet
   lacks this header field.

   As described at the end of Section 5.1, this is not expected from a
   reputable content generator and is often an indication of mass-
   produced spam or other undesirable messages.

   Therefore, an agent compliant with this specification MUST internally
   enact one or more of the following in the absence of a MIME-Version
   header field:

   1.  Ignore all other MIME-specific fields, even if they are
       syntactically valid, thus treating the entire message as a
       single-part message of type text/plain;

   2.  Remove all other MIME-specific fields, even if they are
       syntactically valid, both internally and when emitting the output
       version of the message;

   3.  Rename all other MIME-specific fields, even if they are
       syntactically valid, both internally and when emitting the output
       version of the message.

7.  IANA Considerations

   This memo contains no actions for IANA.

8.  Security Considerations

   The discussions of the anomalies above and their prescribed solutions
   are themselves security considerations.  The practises enumerated in
   this memo are generally perceived to resolve security considerations

Kucherawy                 Expires May 11, 2011                  [Page 7]

Internet-Draft             Mailformed Mail BCP             November 2010

   that already exist rather than introducing new ones.

9.  References

9.1.  Normative References

   [KEYWORDS]    Bradner, S., "Key words for use in RFCs to Indicate
                 Requirement Levels", BCP 14, RFC 2119, March 1997.

   [MAIL]        Resnick, P., "Internet Message Format", RFC 5322,
                 October 2008.

9.2.  Informative References

   [DKIM]        Allman, E., Callas, J., Delany, M., Libbey, M., Fenton,
                 J., and M. Thomas, "DomainKeys Identified Mail (DKIM)
                 Signatures", RFC 4871, May 2007.

   [EMAIL-ARCH]  Crocker, D., "Internet Mail Architecture", RFC 5598,
                 July 2009.

   [MIME]        Freed, N. and N. Borenstein, "Multipurpose Internet
                 Mail Extensions (MIME) Part One: Format of Internet
                 Message Bodies", RFC 2045, November 1996.

   [RFC822]      Crocker, D., "Standard for the Format of Internet Text
                 Messages", RFC 822, August 1982.

Appendix A.  Examples

   Examples, if needed, can go here.

Appendix B.  Acknowledgements

   The author wishes to acknowledge the following for their review and
   constructive criticism of this proposal: (names)

Author's Address

   Murray S. Kucherawy
   Cloudmark, Inc.
   128 King St., 2nd Floor
   San Francisco, CA  94107

   Phone: +1 415 946 3800
   EMail: msk@cloudmark.com

Kucherawy                 Expires May 11, 2011                  [Page 8]

Html markup produced by rfcmarkup 1.129d, available from https://tools.ietf.org/tools/rfcmarkup/