draft-ietf-imapext-thread-10.txt   draft-ietf-imapext-thread-11.txt 
IMAP Extensions Working Group M. Crispin IMAP Extensions Working Group M. Crispin
Internet Draft: IMAP THREAD K. Murchison Internet Draft: IMAP THREAD K. Murchison
Document: internet-drafts/draft-ietf-imapext-thread-10.txt June 2002 Document: internet-drafts/draft-ietf-imapext-thread-11.txt June 2002
INTERNET MESSAGE ACCESS PROTOCOL - THREAD EXTENSION INTERNET MESSAGE ACCESS PROTOCOL - THREAD EXTENSION
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC 2026. all provisions of Section 10 of RFC 2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 33 skipping to change at page 1, line 33
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
To view the list Internet-Draft Shadow Directories, see To view the list Internet-Draft Shadow Directories, see
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
A revised version of this draft document will be submitted to the RFC A revised version of this draft document will be submitted to the RFC
editor as a Proposed Standard for the Internet Community. Discussion editor as a Proposed Standard for the Internet Community. Discussion
and suggestions for improvement are requested, and should be sent to and suggestions for improvement are requested, and should be sent to
ietf-imapext@IMC.ORG. This document will expire before 20 December ietf-imapext@IMC.ORG. This document will expire before 22 December
2002. Distribution of this memo is unlimited. 2002. Distribution of this memo is unlimited.
Abstract Abstract
This document describes the server-based threading extension to the This document describes the server-based threading extension to the
IMAP4rev1 protocol. This extension provides substantial performance IMAP4rev1 protocol. This extension provides substantial performance
improvements for IMAP clients which offer threaded views. improvements for IMAP clients which offer threaded views.
A server which supports this extension indicates this with one or A server which supports this extension indicates this with one or
more capability names consisting of "THREAD=" followed by a supported more capability names consisting of "THREAD=" followed by a supported
threading algorithm name as described in this document. This threading algorithm name as described in this document. This
provides for future upwards-compatible extensions. provides for future upwards-compatible extensions.
Extracted Subject Text Base Subject Text
Threading uses a version of the subject which has specific subject Threading uses the "base subject," which has specific subject
artifacts of deployed Internet mail software removed. Due to the artifacts of deployed Internet mail software removed. Due to the
complexity of these artifacts, the formal syntax for the subject complexity of these artifacts, the formal syntax for the subject
extraction rules is ambiguous. The following procedure is followed extraction rules is ambiguous. The following procedure is followed
to determine the actual "base subject" which is used to thread: to determine the actual "base subject" which is used to thread:
(1) Convert any RFC 2047 encoded-words in the subject to (1) Convert any RFC 2047 encoded-words in the subject to
UTF-8. Convert all tabs and continuations to space. UTF-8. Convert all tabs and continuations to space.
Convert all multiple spaces to a single space. Convert all multiple spaces to a single space.
(2) Remove all trailing text of the subject that matches (2) Remove all trailing text of the subject that matches
skipping to change at page 4, line 10 skipping to change at page 4, line 10
dates; smaller sizes sort before larger sizes; and strings are dates; smaller sizes sort before larger sizes; and strings are
sorted according to ascending values established by their sorted according to ascending values established by their
collation algorithm (see under "Internationalization collation algorithm (see under "Internationalization
Considerations"). Considerations").
The defined threading algorithms are as follows: The defined threading algorithms are as follows:
ORDEREDSUBJECT ORDEREDSUBJECT
The ORDEREDSUBJECT threading algorithm is also referred to as The ORDEREDSUBJECT threading algorithm is also referred to as
"poor man's threading." The searched messages are sorted by "poor man's threading." The searched messages are sorted by
subject and then by the sent date. The messages are then split base subject and then by the sent date. The messages are then
into separate threads, with each thread containing messages split into separate threads, with each thread containing
with the same extracted subject text. Finally, the threads are messages with the same base subject text. Finally, the threads
sorted by the sent date of the first message in the thread. are sorted by the sent date of the first message in the thread.
Note that each message in a thread is a child (as opposed to a Note that each message in a thread is a child (as opposed to a
sibling) of the previous message. sibling) of the previous message.
REFERENCES REFERENCES
The REFERENCES threading algorithm is based on the algorithm The REFERENCES threading algorithm is based on the algorithm
written by Jamie Zawinski which was used in "Netscape Mail and written by Jamie Zawinski which was used in "Netscape Mail and
News" versions 2.0 through 3.0. For details, see News" versions 2.0 through 3.0. For details, see
http://www.jwz.org/doc/threading.html. http://www.jwz.org/doc/threading.html.
This algorithm threads the searched messages by grouping them This algorithm threads the searched messages by grouping them
together in parent/child relationships based on which messages together in parent/child relationships based on which messages
are replies to others. The parent/child relationships are are replies to others. The parent/child relationships are
built using two methods: reconstructing a message's ancestry built using two methods: reconstructing a message's ancestry
using the references contained within it; and checking the using the references contained within it; and checking the
subject of a message to see if it is a reply to (or forward of) original (not base) subject of a message to see if it is a
another. reply to (or forward of) another message.
Note: "Message ID" in the following description refers to a Note: "Message ID" in the following description refers to a
normalized form of the msg-id in [RFC 2822]. The actual normalized form of the msg-id in [RFC 2822]. The actual
text in an RFC 2822 may use quoting, resulting in multiple text in an RFC 2822 may use quoting, resulting in multiple
ways of expressing the same Message ID. Implementations of ways of expressing the same Message ID. Implementations of
the REFERENCES threading algorithm MUST normalize any msg-id the REFERENCES threading algorithm MUST normalize any msg-id
in order to avoid false non-matches due to differences in in order to avoid false non-matches due to differences in
quoting. quoting.
For example, the msg-id For example, the msg-id
skipping to change at page 5, line 23 skipping to change at page 5, line 23
discipline has not been followed. For example, discipline has not been followed. For example,
In-Reply-To headers have been observed with email In-Reply-To headers have been observed with email
addresses after the Message ID, and there are no good addresses after the Message ID, and there are no good
heuristics for software to determine the difference. heuristics for software to determine the difference.
This is not a problem with the References header however. This is not a problem with the References header however.
If a message does not contain an In-Reply-To header line, or If a message does not contain an In-Reply-To header line, or
the In-Reply-To header line does not contain a valid Message the In-Reply-To header line does not contain a valid Message
ID, then the message does not have any references (NIL). ID, then the message does not have any references (NIL).
A message is considered to be a reply or forward the subject A message is considered to be a reply or forward if the base
extraction rules, applied to the original subject, remove any subject extraction rules, applied to the original subject,
of the following: a subj-refwd, a "(fwd)" subj-trailer, or a remove any of the following: a subj-refwd, a "(fwd)"
subj-fwd-hdr and subj-fwd-trl. subj-trailer, or a subj-fwd-hdr and subj-fwd-trl.
The REFERENCES algorithm is significantly more complex than The REFERENCES algorithm is significantly more complex than
ORDEREDSUBJECT and consists of six main steps. These steps are ORDEREDSUBJECT and consists of six main steps. These steps are
outlined in detail below. outlined in detail below.
(1) For each searched message: (1) For each searched message:
(A) Using the Message IDs in the message's references, link (A) Using the Message IDs in the message's references, link
the corresponding messages (those whose Message-ID header the corresponding messages (those whose Message-ID header
line contains the given reference Message ID) together as line contains the given reference Message ID) together as
skipping to change at page 7, line 11 skipping to change at page 7, line 11
(4) Sort the messages under the root (top-level siblings only) (4) Sort the messages under the root (top-level siblings only)
by sent date. In the case of an exact match on sent date or if by sent date. In the case of an exact match on sent date or if
either of the Date: headers used in a comparison can not be either of the Date: headers used in a comparison can not be
parsed, use the order in which the messages appear in the parsed, use the order in which the messages appear in the
mailbox (that is, by sequence number) to determine the order. mailbox (that is, by sequence number) to determine the order.
In the case of a dummy message, sort its children by sent date In the case of a dummy message, sort its children by sent date
and then use the first child for the top-level sort. and then use the first child for the top-level sort.
(5) Gather together messages under the root that have the same (5) Gather together messages under the root that have the same
extracted subject text. base subject text.
(A) Create a table for associating extracted subjects with (A) Create a table for associating base subjects with
messages. messages, called the subject table.
(B) Populate the subject table with one message per (B) Populate the subject table with one message per each
extracted subject. For each child of the root: base subject. For each child of the root:
(i) Find the subject of this thread by extracting the (i) Find the subject of this thread, by using the base
base subject from the current message, or its first child subject from either the current message or its first
if the current message is a dummy. child if the current message is a dummy. This is the
thread subject.
(ii) If the extracted subject is empty, skip this (ii) If the thread subject is empty, skip this message.
message.
(iii) Lookup the message associated with this extracted (iii) Look up the message associated with the thread
subject in the table. subject in the subject table.
(iv) If there is no message in the table with this (iv) If there is no message in the subject table with the
subject, add the current message and the extracted thread subject, add the current message and the thread
subject to the subject table. subject to the subject table.
Otherwise, replace the message in the table with the Otherwise, if the message in the subject table is not a
current message if the message in the table is not a dummy, AND either of the following criteria are true:
dummy AND either of the following criteria are true:
The current message is a dummy, OR The current message is a dummy, OR
The message in the table is a reply or forward and the The message in the subject table is a reply or forward
current message is not. and the current message is not.
(C) Merge threads with the same subject. For each child of then replace the message in the subject table with the
the root: current message.
(i) Find the subject of this thread as in step 5.B.i (C) Merge threads with the same thread subject. For each
child of the root:
(i) Find the message's thread subject as in step 5.B.i
above. above.
(ii) If the extracted subject is empty, skip this (ii) If the thread subject is empty, skip this message.
message.
(iii) Lookup the message associated with this extracted (iii) Lookup the message associated with this thread
subject in the table. subject in the subject table.
(iv) If the message in the table is the current message, (iv) If the message in the subject table is the current
skip this message. message, skip this message.
Otherwise, merge the current message with the one in the Otherwise, merge the current message with the one in the
table using the following rules: subject table using the following rules:
If both messages are dummies, append the current If both messages are dummies, append the current
message's children to the children of the message in message's children to the children of the message in
the table (the children of both messages become the subject table (the children of both messages
siblings), and then delete the current message. become siblings), and then delete the current message.
If the message in the table is a dummy and the current If the message in the subject table is a dummy and the
message is not, make the current message a child of current message is not, make the current message a
the message in the table (a sibling of it's children). child of the message in the subject table (a sibling
of its children).
If the current message is a reply or forward and the If the current message is a reply or forward and the
message in the table is not, make the current message message in the subject table is not, make the current
a child of the message in the table (a sibling of it's message a child of the message in the subject table (a
children). sibling of its children).
Otherwise, create a new dummy message and make both Otherwise, create a new dummy message and make both
the current message and the message in the table the current message and the message in the subject
children of the dummy. Then replace the message in table children of the dummy. Then replace the message
the table with the dummy message. in the subject table with the dummy message.
Note: Subject comparisons are case-insensitive, as Note: Subject comparisons are case-insensitive, as
described under "Internationalization described under "Internationalization
Considerations." Considerations."
(6) Traverse the messages under the root and sort each set of (6) Traverse the messages under the root and sort each set of
siblings by sent date. Traverse the messages in such a way siblings by sent date. Traverse the messages in such a way
that the "youngest" set of siblings are sorted first, and the that the "youngest" set of siblings are sorted first, and the
"oldest" set of siblings are sorted last (grandchildren are "oldest" set of siblings are sorted last (grandchildren are
sorted before children, etc). In the case of an exact match on sorted before children, etc). In the case of an exact match on
skipping to change at page 12, line 20 skipping to change at page 12, line 20
thread-members = nz-number *(SP nz-number) [SP thread-nested] thread-members = nz-number *(SP nz-number) [SP thread-nested]
thread-nested = 2*thread-list thread-nested = 2*thread-list
thread = ["UID" SP] "THREAD" SP thread-algorithm thread = ["UID" SP] "THREAD" SP thread-algorithm
SP search-charset 1*(SP search-key) SP search-charset 1*(SP search-key)
thread-algorithm = "ORDEREDSUBJECT" / "REFERENCES" / atom thread-algorithm = "ORDEREDSUBJECT" / "REFERENCES" / atom
The following syntax describes subject extraction rules (2)-(6): The following syntax describes base subject extraction rules (2)-(6):
subject = *subj-leader [subj-middle] *subj-trailer subject = *subj-leader [subj-middle] *subj-trailer
subj-refwd = ("re" / ("fw" ["d"])) *WSP [subj-blob] ":" subj-refwd = ("re" / ("fw" ["d"])) *WSP [subj-blob] ":"
subj-blob = "[" *BLOBCHAR "]" *WSP subj-blob = "[" *BLOBCHAR "]" *WSP
subj-fwd = subj-fwd-hdr subject subj-fwd-trl subj-fwd = subj-fwd-hdr subject subj-fwd-trl
subj-fwd-hdr = "[fwd:" subj-fwd-hdr = "[fwd:"
skipping to change at page 14, line 6 skipping to change at page 14, line 6
internationalization. internationalization.
It is anticipated that there will be a generic Unicode sorting It is anticipated that there will be a generic Unicode sorting
collation, which will provide generic case-insensitivity for collation, which will provide generic case-insensitivity for
alphabetic scripts, specification of composed character handling, and alphabetic scripts, specification of composed character handling, and
language-specific sorting collations. A server which implements language-specific sorting collations. A server which implements
non-default sorting collations will modify its sorting behavior non-default sorting collations will modify its sorting behavior
according to the selected sorting collation. according to the selected sorting collation.
Non-English translations of "Re" or "Fw"/"Fwd" are not specified for Non-English translations of "Re" or "Fw"/"Fwd" are not specified for
removal in the extracted subject text process. By specifying that removal in the base subject extraction process. By specifying that
only the English forms of the prefixes are used, it becomes a simple only the English forms of the prefixes are used, it becomes a simple
display time task to localize the prefix language for the user. If, display time task to localize the prefix language for the user. If,
on the other hand, prefixes in multiple languages are permitted, the on the other hand, prefixes in multiple languages are permitted, the
result is a geometrically complex, and ultimately unimplementable, result is a geometrically complex, and ultimately unimplementable,
task. In order to improve the ability to support non-English display task. In order to improve the ability to support non-English display
in Internet mail clients, only the English form of these prefixes in Internet mail clients, only the English form of these prefixes
should be transmitted in Internet mail messages. should be transmitted in Internet mail messages.
A. References A. References
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/