Internet Draft                                              Matt Curtin
draft-ietf-usefor-message-id-00.txt
draft-ietf-usefor-message-id-01.txt           The Ohio State University
Category-to-be: Informational                            Jamie Zawinski
                                                Netscape Communications

                                                              June

                                                              July 1998
                                    Expires: Six Months from above date

	      Recommendations for generating Message IDs

			 Status of this Memo

This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, and
its working groups. Note that other groups may also distribute working
documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference material
or to cite them other than as ``work in progress.''

To view the entire list of current Internet-Drafts, please check the
"1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe),
ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim),
ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast).

			       Abstract

This draft provides recommendations on how to generate globally unique
Message IDs in client software.

Table of Contents

1. Introduction
2. Message-ID formatting
3. Message-ID generation
3.1 "Domain part"
3.2 "Local part"
3.2.1 Sequence number
3.2.2 Using a pseudorandom number generator
3.2.3 Using a hash
3.3 Bringing it all together
4. Acknowledgments
5. References
6. Authors' addresses

1. Introduction

Message-ID headers are used to uniquely identify Internet messages.
Having a unique identifier for each message has many benefits,
including ease in the following of threads and intelligent scoring of
messages based on threads to which they belong.

It has been suggested that it is impossible for client software to be
able to generate globally-unique Message-IDs.  We believe this to be
incorrect, and herein to offer suggestions for generating unique
Message-IDs.

2. Message-ID formatting

As defined in [NEWS], a message ID consists of two parts, a local part
and a domain, separated by an at-sign and enclosed in angle brackets:

    message-id = "<" local-part "@" domain ">"

Practically, news message IDs are a restricted subset of mail message
IDs.  In particular, no existing news software copes properly with mail
quoting conventions within the local part, so software generating a
Message-ID would be well-advised to avoid this pitfall.

It is also noted that some buggy software considers message IDs
completely case-insensitive, in violation of the standards.  It is
therefore advised that one not generate IDs such that two IDs so
generated can differ only in character case.

3. Message-ID generation

As shown above, the Message-ID is made up of two sections.  We'll
consider each seperately.

3.1. "Domain part"

On many client systems, it is not always possible to get the
fully-qualified domain name (FQDN) of the local host.  In that
situation, a reasonable fallback course of action would be to use the
domain-part of the user's return address.  (Use of an unqualified
hostname for the domain part of the Message-ID header would be
foolish, and should never be done.)

Using the domain-part of the user's return address makes the
generation of the "local part" be more important; in particular, it
means that a process ID is probably not sufficient.

3.2. "Local part"

The most popular method of generating local parts is to use the date and
time, plus some way of distinguishing between simultaneous postings on
the same host (e.g. a process number), and encode them in a suitably-
restricted alphabet.

A number of approaches here are possible.  Each has its advantages and
drawbacks.  The importance of the local part's uniqueness increases
with the frequency of messages being generated in a given domain.
Using several of these methods together will produce a Message-ID that
is longer, but significantly less likely to collide.

3.2.1. Sequence number

An older but now less-popular alternative is to use a sequence number,
incremented each time the host generates a new message ID; this is workable,
workable for servers, but requires careful design to cope properly
with simultaneous posting attempts, and is not as robust in the
presence of crashes and other malfunctions.

On many  For client systems, it Message-ID
generation, particularly on hosts where the exact FQDN cannot be
obtained, or is subject to change, this might not always possible even be workable.

3.2.2. Using a psuedorandom number generator

One could take 64 bits from a good, well-seeded pseudorandom number
generator [PRNG] in order to get significantly increase the
fully-qualified domain name (FQDN) uniqueness of
the local host.  In Message-ID.  The advantage of this method is that
situation, it is fast and
generally effective.  The disadvantage is that in a reasonable fallback course of action would be to use the
domain-part of the user's return address.  Doing so makes the perfect random
number generation scheme, the possibility of getting the "distinguishing number" be more important; same number
twice in particular, it
means that a process ID row is probably not sufficient.

An alternative for generating the distinguishing number, on systems
where exactly the process ID isn't available, or in same probability as getting any two
numbers.

3.2.3. Using a hash

Another approach would be to generate a hash of the case where message and use
that after the local
host's FQDN isn't known, timestamp.  If this is to done well, this can also
significantly reduce the opportunity for collision, and will generate
a large random number from a
high-quality, well-seeded pseudorandom number generator.  (Note value that the
RNGs shipped by many vendors is relatively unique.  Note that, in practice, this is
more difficult than it sounds.  It is recommended that a
cryptographically secure hash function [SHA1, MD5] be used, as
others, such as CRC, are not high quality.) likely to have higher instances of collision.

3.3. Bringing it all together

In summary, one possible approach the approaches to generating a Message-ID would be:

  * that we'll
consider here are in the following format:

  1  Append "<".

  *

  2  Get the current (wall-clock) time in the highest resolution to which you have
     access (most (at least seconds, though most systems can will give it to you
     milliseconds) and generate a timestamp in
     milliseconds, but seconds will do);

  * the format
     yyyymmddHHMMSS.ss;

  3  Generate 64 bits of randomness from a good, well-seeded random
     number generator;

  * additional data to prevent Message-ID collision on two
     messages processed by the same host at precisely the same
     moment.  (See section 3.2.) Convert these two numbers to base 36
     (0-9 and A-Z) A-Z), and append write the first number, then additional parts,
     each section seperated by a ".", the second number, and an "@".  This makes the
     left hand side of the message ID be only about 21 characters long.

  *

  5  Append the FQDN of the local host, or the host name in the user's
     return address.

  *

  6  Append ">".

If the random number generator is good, this will reduce the odds of a
collision of message IDs to well below the odds that a cosmic ray will
cause the computer to miscompute a result.  That means that it's good
enough.

There are many other approaches.  This is provided only as an example.

4. Acknowledgments

This document is partially derived from an earlier, unrelated draft by
Henry Spencer.

5. References

Ref.          Author, title                         IETF status (June 1998)
                                                    ----------------------
---           -------------

[NEWS]        M.R. Horton, R. Adams: "Standard      Non-standard (but still
              for interchange of USENET             widely used as a de-facto
              messages", RFC 1036, December         standard).
              1987.

[SHA1]        National Institute of Standards
	      and Technology (NIST), "Announcement
	      of Weakness in the Secure Hash
	      Standard", May 1994.  (Update of
	      FIPS 180:  "Secure Hash Standard".)

[MD5]	      R. Rivest: "The MD5 Message-Digest    Informational (but
	      Algorithm", RFC 1321, April 1992.     (widely used as a
						    de-facto standard).

[PRNG]        D. Eastlake, 3rd, S. Crocker,	    Informational.
	      J. Schiller: "Randomness
	      Recommendations for Security",
	      RFC 1750, December 1994.

6. Authors' Addresses

Matt Curtin
The Ohio State University
791 Dreese Laboratories
2015 Neil Ave
Columbus OH 43210
+1 614 292 7352
cmcurtin@cis.ohio-state.edu

Jamie Zawinski
Netscape Communications Corporation
501 East Middlefield Road
Mountain View, CA 94043
(650) 937-2620
jwz@netscape.com