draft-ietf-usefor-message-id-00.txt   draft-ietf-usefor-message-id-01.txt 
Internet Draft Matt Curtin Internet Draft Matt Curtin
draft-ietf-usefor-message-id-00.txt The Ohio State University draft-ietf-usefor-message-id-01.txt The Ohio State University
Category-to-be: Informational Jamie Zawinski Category-to-be: Informational Jamie Zawinski
Netscape Communications Netscape Communications
June 1998 July 1998
Expires: Six Months from above date Expires: Six Months from above date
Recommendations for generating Message IDs Recommendations for generating Message IDs
Status of this Memo Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, and documents of the Internet Engineering Task Force (IETF), its areas, and
its working groups. Note that other groups may also distribute working its working groups. Note that other groups may also distribute working
documents as Internet-Drafts. documents as Internet-Drafts.
skipping to change at line 40 skipping to change at line 40
Abstract Abstract
This draft provides recommendations on how to generate globally unique This draft provides recommendations on how to generate globally unique
Message IDs in client software. Message IDs in client software.
Table of Contents Table of Contents
1. Introduction 1. Introduction
2. Message-ID formatting 2. Message-ID formatting
3. Message-ID generation 3. Message-ID generation
3.1 "Domain part"
3.2 "Local part"
3.2.1 Sequence number
3.2.2 Using a pseudorandom number generator
3.2.3 Using a hash
3.3 Bringing it all together
4. Acknowledgments 4. Acknowledgments
5. References 5. References
6. Authors' addresses 6. Authors' addresses
1. Introduction 1. Introduction
Message-ID headers are used to uniquely identify Internet messages. Message-ID headers are used to uniquely identify Internet messages.
Having a unique identifier for each message has many benefits, Having a unique identifier for each message has many benefits,
including ease in the following of threads and intelligent scoring of including ease in the following of threads and intelligent scoring of
messages based on threads to which they belong. messages based on threads to which they belong.
skipping to change at line 75 skipping to change at line 81
quoting conventions within the local part, so software generating a quoting conventions within the local part, so software generating a
Message-ID would be well-advised to avoid this pitfall. Message-ID would be well-advised to avoid this pitfall.
It is also noted that some buggy software considers message IDs It is also noted that some buggy software considers message IDs
completely case-insensitive, in violation of the standards. It is completely case-insensitive, in violation of the standards. It is
therefore advised that one not generate IDs such that two IDs so therefore advised that one not generate IDs such that two IDs so
generated can differ only in character case. generated can differ only in character case.
3. Message-ID generation 3. Message-ID generation
The most popular method of generating local parts is to use the date and As shown above, the Message-ID is made up of two sections. We'll
time, plus some way of distinguishing between simultaneous postings on consider each seperately.
the same host (e.g. a process number), and encode them in a suitably-
restricted alphabet. An older but now less-popular alternative is to 3.1. "Domain part"
use a sequence number, incremented each time the host generates a new
message ID; this is workable, but requires careful design to cope
properly with simultaneous posting attempts, and is not as robust in the
presence of crashes and other malfunctions.
On many client systems, it is not always possible to get the On many client systems, it is not always possible to get the
fully-qualified domain name (FQDN) of the local host. In that fully-qualified domain name (FQDN) of the local host. In that
situation, a reasonable fallback course of action would be to use the situation, a reasonable fallback course of action would be to use the
domain-part of the user's return address. Doing so makes the generation domain-part of the user's return address. (Use of an unqualified
of the "distinguishing number" be more important; in particular, it hostname for the domain part of the Message-ID header would be
foolish, and should never be done.)
Using the domain-part of the user's return address makes the
generation of the "local part" be more important; in particular, it
means that a process ID is probably not sufficient. means that a process ID is probably not sufficient.
An alternative for generating the distinguishing number, on systems 3.2. "Local part"
where the process ID isn't available, or in the case where the local
host's FQDN isn't known, is to generate a large random number from a
high-quality, well-seeded pseudorandom number generator. (Note that the
RNGs shipped by many vendors are not high quality.)
In summary, one possible approach to generating a Message-ID would be: The most popular method of generating local parts is to use the date and
time, plus some way of distinguishing between simultaneous postings on
the same host (e.g. a process number), and encode them in a suitably-
restricted alphabet.
* Append "<". A number of approaches here are possible. Each has its advantages and
drawbacks. The importance of the local part's uniqueness increases
with the frequency of messages being generated in a given domain.
Using several of these methods together will produce a Message-ID that
is longer, but significantly less likely to collide.
* Get the current (wall-clock) time in the highest resolution to 3.2.1. Sequence number
which you have access (most systems can give it to you in
milliseconds, but seconds will do);
* Generate 64 bits of randomness from a good, well-seeded random An older but now less-popular alternative is to use a sequence number,
number generator; incremented each time the host generates a new message ID; this is
workable for servers, but requires careful design to cope properly
with simultaneous posting attempts, and is not as robust in the
presence of crashes and other malfunctions. For client Message-ID
generation, particularly on hosts where the exact FQDN cannot be
obtained, or is subject to change, this might not even be workable.
* Convert these two numbers to base 36 (0-9 and A-Z) and append the 3.2.2. Using a psuedorandom number generator
first number, a ".", the second number, and an "@". This makes the
left hand side of the message ID be only about 21 characters long.
* Append the FQDN of the local host, or the host name in the user's One could take 64 bits from a good, well-seeded pseudorandom number
return address. generator [PRNG] in order to significantly increase the uniqueness of
the Message-ID. The advantage of this method is that it is fast and
generally effective. The disadvantage is that in a perfect random
number generation scheme, the possibility of getting the same number
twice in a row is exactly the same probability as getting any two
numbers.
* Append ">". 3.2.3. Using a hash
If the random number generator is good, this will reduce the odds of a Another approach would be to generate a hash of the message and use
collision of message IDs to well below the odds that a cosmic ray will that after the timestamp. If this is done well, this can also
cause the computer to miscompute a result. That means that it's good significantly reduce the opportunity for collision, and will generate
enough. a value that is relatively unique. Note that, in practice, this is
more difficult than it sounds. It is recommended that a
cryptographically secure hash function [SHA1, MD5] be used, as
others, such as CRC, are likely to have higher instances of collision.
There are many other approaches. This is provided only as an example. 3.3. Bringing it all together
In summary, the approaches to generating a Message-ID that we'll
consider here are in the following format:
1 Append "<".
2 Get the current time in the highest resolution to which you have
access (at least seconds, though most systems will give you
milliseconds) and generate a timestamp in the format
yyyymmddHHMMSS.ss;
3 Generate additional data to prevent Message-ID collision on two
messages processed by the same host at precisely the same
moment. (See section 3.2.) Convert these two numbers to base 36
(0-9 and A-Z), and write the first number, then additional parts,
each section seperated by a ".", and an "@".
5 Append the FQDN of the local host, or the host name in the user's
return address.
6 Append ">".
4. Acknowledgments 4. Acknowledgments
This document is partially derived from an earlier, unrelated draft by This document is partially derived from an earlier, unrelated draft by
Henry Spencer. Henry Spencer.
5. References 5. References
Ref. Author, title IETF status (June 1998) Ref. Author, title IETF status (June 1998)
---------------------- ----------------------
--- ------------- --- -------------
[NEWS] M.R. Horton, R. Adams: "Standard Non-standard (but still [NEWS] M.R. Horton, R. Adams: "Standard Non-standard (but still
for interchange of USENET widely used as a de-facto for interchange of USENET widely used as a de-facto
messages", RFC 1036, December standard). messages", RFC 1036, December standard).
1987. 1987.
[SHA1] National Institute of Standards
and Technology (NIST), "Announcement
of Weakness in the Secure Hash
Standard", May 1994. (Update of
FIPS 180: "Secure Hash Standard".)
[MD5] R. Rivest: "The MD5 Message-Digest Informational (but
Algorithm", RFC 1321, April 1992. (widely used as a
de-facto standard).
[PRNG] D. Eastlake, 3rd, S. Crocker, Informational.
J. Schiller: "Randomness
Recommendations for Security",
RFC 1750, December 1994.
6. Authors' Addresses 6. Authors' Addresses
Matt Curtin Matt Curtin
The Ohio State University The Ohio State University
791 Dreese Laboratories 791 Dreese Laboratories
2015 Neil Ave 2015 Neil Ave
Columbus OH 43210 Columbus OH 43210
+1 614 292 7352 +1 614 292 7352
cmcurtin@cis.ohio-state.edu cmcurtin@cis.ohio-state.edu
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/