draft-ietf-imapext-thread-00.txt   draft-ietf-imapext-thread-01.txt 
Network Working Group M. Crispin IMAP Extensions Working Group M. Crispin
Internet Draft: IMAP THREAD University of Washington Internet Draft: IMAP THREAD K. Murchison
Document: internet-drafts/draft-ietf-imapext-thread-00.txt February 2000 Document: internet-drafts/draft-ietf-imapext-thread-01.txt August 2000
INTERNET MESSAGE ACCESS PROTOCOL - THREAD EXTENSION INTERNET MESSAGE ACCESS PROTOCOL - THREAD EXTENSION
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC 2026. all provisions of Section 10 of RFC 2026.
This document is an Internet-Draft. Internet-Drafts are working Internet-Drafts are working documents of the Internet Engineering
documents of the Internet Engineering Task Force (IETF), its areas, Task Force (IETF), its areas, and its working groups. Note that
and its working groups. Note that other groups may also distribute other groups may also distribute working documents as Internet-
working documents as Internet-Drafts. Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at To view the list Internet-Draft Shadow Directories, see
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
To learn the current status of any Internet-Draft, please check the
"1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
Directories on ds.internic.net (US East Coast), nic.nordu.net
(Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
Rim).
A revised version of this draft document will be submitted to the RFC A revised version of this draft document will be submitted to the RFC
editor as a Proposed Standard for the Internet Community. editor as a Proposed Standard for the Internet Community.
Discussion and suggestions for improvement are requested, and should Discussion and suggestions for improvement are requested, and should
be sent to ietf-imapext@IMC.ORG. This document will expire before 5 be sent to ietf-imapext@IMC.ORG. This document will expire before 10
July 2000. Distribution of this memo is unlimited. January 2001. Distribution of this memo is unlimited.
Abstract Abstract
This document describes the server-based threading extension to the This document describes the server-based threading extension to the
IMAP4rev1 protocol. This extension provides substantial performance IMAP4rev1 protocol. This extension provides substantial performance
improvements for IMAP clients which offer threaded views. improvements for IMAP clients which offer threaded views.
A server which supports this extension indicates this with more or A server which supports this extension indicates this with more or
more capability names consisting of "THREAD-" followed by a supported more capability names consisting of "THREAD=" followed by a supported
threading algorithm name as described in this document. This threading algorithm name as described in this document. This
provides for future upwards-compatible extensions. provides for future upwards-compatible extensions.
Extracted Subject Text Extracted Subject Text
Threading algorithms use a version of the subject which has specific Threading uses a version of the subject which has specific subject
subject artifacts of deployed Internet mail software removed. Due to artifacts of deployed Internet mail software removed. Due to the
the complexity of these artifacts, the above syntax is ambiguous. complexity of these artifacts, the formal syntax for the subject
The following procedure is followed to determing the actual "base extraction rules is ambiguous. The following procedure is followed
subject": to determining the actual "base subject" which is used to thread:
(1) Remove all trailing text of the subject that matches (1) Convert any RFC 2047 encoded-words in the subject to
UTF-8. Convert all tabs and continuations to space.
Convert all multiple spaces to a single space.
(2) Remove all trailing text of the subject that matches
the subj-trailer ABNF, repeat until no more matches are the subj-trailer ABNF, repeat until no more matches are
possible. possible.
(2) Remove all prefix text of the subject that matches (3) Remove all prefix text of the subject that matches the
subj-leader. subj-leader ABNF.
(3) If there is prefix text of the subject that matches (4) If there is prefix text of the subject that matches the
subj-blob, and removing that prefix leaves a non-empty subj-blob ABNF, and removing that prefix leaves a non-empty
subj-base, then remove the prefix text. subj-base, then remove the prefix text.
(4) Repeat (2) and (3) until no matches remain. (5) Repeat (3) and (4) until no matches remain.
(5) Convert any RFC 2047 encoded-words in the remaining Note: it is possible to defer step (2) until step (6), but this
subj-base to UTF-8. requires checking for subj-trailer in step (4).
(6) The resulting text is the "base subject" used in the (6) If the resulting text begins with the subj-fwd-hdr ABNF
THREAD. and ends with the subj-fwd-trl ABNF, remove the
subj-fwd-hdr and subj-fwd-trl and repeat from step (2).
(7) The resulting text is the "base subject" used in
threading.
All servers and disconnected clients MUST use exactly this algorithm All servers and disconnected clients MUST use exactly this algorithm
when threading. Otherwise there is potential for a user to get when threading. Otherwise there is potential for a user to get
inconsistant results based on whether they are running in connected inconsistant results based on whether they are running in connected
or disconnected IMAP mode. or disconnected IMAP mode.
Additional Commands Additional Commands
This command is an extension to the IMAP4rev1 base protocol. This command is an extension to the IMAP4rev1 base protocol.
skipping to change at page 3, line 39 skipping to change at page 3, line 39
charset. Note that unlike SEARCH, the searching charset argument charset. Note that unlike SEARCH, the searching charset argument
is mandatory. is mandatory.
There is also a UID THREAD command which corresponds to THREAD the There is also a UID THREAD command which corresponds to THREAD the
way that UID SEARCH corresponds to SEARCH. way that UID SEARCH corresponds to SEARCH.
The THREAD command first searches the mailbox for messages that The THREAD command first searches the mailbox for messages that
match the given searching criteria using the charset argument for match the given searching criteria using the charset argument for
the interpretation of strings in the searching criteria. It then the interpretation of strings in the searching criteria. It then
returns the matching messages in an untagged THREAD response, returns the matching messages in an untagged THREAD response,
threaded according to the specified threading algorithm. Unlike threaded according to the specified threading algorithm.
SEARCH, if no messages match the searching criteria in a THREAD
command, no untagged THREAD response is returned.
The defined threading algorithms are as follows: The defined threading algorithms are as follows:
ORDEREDSUBJECT ORDEREDSUBJECT
The ORDEREDSUBJECT threading algorithm is also referred to as The ORDEREDSUBJECT threading algorithm is also referred to as
"poor man's threading." The searched messages are sorted by "poor man's threading." The searched messages are sorted by
subject and then by sent date, equivalent to a "SORT (SUBJECT subject and then by sent date, equivalent to a "SORT (SUBJECT
DATE)". The messages are then split into separate threads, DATE)". The messages are then split into separate threads,
with each thread containing messages with the same extracted with each thread containing messages with the same extracted
subject text. Finally, the threads are sorted by the sent date subject text. Finally, the threads are sorted by the sent date
of the first message in the thread. of the first message in the thread.
Example: C: A283 THREAD ORDEREDSUBJECT UTF-8 SINCE 5-AUG-1999 Note that each message in a thread is a child (as opposed to a
S: * THREAD (146 151)(144 145)(155)(147 148 152 sibling) of the previous message.
167)(182)(181)(149)(154)(153)(164 170)(156)
(158 161 162)(157 160 163)(159)(183 185)(165) REFERENCES
(166)(168)(169)(171)(172)(173)(186)(174)(150) The REFERENCES threading algorithm is based on the algorithm
(175)(176)(177 179 184)(178)(180)(187) written by Jamie Zawinski which was used in "Netscape Mail and
News" versions 2.0 through 3.0. For details, see
http://www.jwz.org/docs/threading.html.
This algorithm threads the searched messages by grouping them
together in parent/child relationships based on which messages
are replies to others. The parent/child relationships are
built using two methods: reconstructing a message's ancestry
using the references contained within it; and checking the
subject of a message to see if it is a reply to (or forward of)
another.
The references used for reconstructing a message's ancestry are
found using the following rules:
If a message contains a [NEWS]-style References header, then
use the Message-IDs in this header as the references.
If a message does not contain a References header, or the
header does not contain any Message-IDs, then use the first
(if any) Message-ID found in the In-Reply-To header as the
only reference (parent) for this message.
If a message does not contain an In-Reply-To header, or the
header does not contain a Message-ID, then the message does
not have any references (NIL).
The REFERENCES algorithm is significantly more complex than
ORDEREDSUBJECT and consists of five main steps. These steps
are outlined in detail below.
(1) For each searched message:
(A) Using the Message-IDs in the message's references, link
the corresponding messages together as parent/child. Make
the first reference the parent of the second (and the second
a child of the first), the second the parent of the third
(and the third a child of the second), etc. The following
rules govern the creation of these links:
If no reference message can be found with a given
Message-ID, create a dummy message with this ID. Use
this dummy message for all subsequent references to this
ID.
If a reference message already has a parent, don't change
the existing link.
Do not create a parent/child link if creating that link
would introduce a loop. For example, before making
message A the parent of B, make sure that A is not a
descendent of B.
(B) Create a parent/child link between the last reference
(or NIL if there are no references) and the current message.
If the current message has a parent already, break the
current parent/child link before creating the new one. Note
that if this message has no references, that it will now
have no parent.
NOTE: The parent/child links MUST be kept consistent with
one another at ALL times.
(2) Gather together all of the messages that have no parents
and make them all children (siblings of one another) of a dummy
parent (the "root"). These messages constitute first messages
of the threads created thus far.
(3) Prune dummy messages from the thread tree. Traverse each
thread under the root, and for each message:
If it is a dummy message with NO children, delete it.
If it is a dummy message with children, delete it, but
promote its children to the current level. In other words,
splice them in with the dummy's siblings.
Do not promote the children if doing so would make them
children of the root, unless there is only one child.
(4) Gather together messages under the root that have the same
extracted subject text.
(A) Create a table for associating extracted subjects with
messages.
(B) Populate the subject table with one message per
extracted subject. For each message under the root:
(i) Find the subject of this thread by extracting the
base subject from the current message, or its first child
if the current message is a dummy.
(ii) Lookup the message associated with this extracted
subject in the table.
(iii) If there is no message in the table with this
subject, add the current message and the extracted
subject to the subject table.
Otherwise, replace the message in the table with the
current message if either of the following criteria are
true:
The current message is a dummy and the one in the
table is not, OR
The message in the table is a reply or forward (its
original subject contains a subj-refwd part and/or a
"(fwd)" subj-trailer) and the current message is not.
[OPTIONAL REPLACEMENT FOR CRITERION ABOVE] The message
in the table is a deeper reply or forward than the
current message (its original subject contains more
subj-refwd and "(fwd)" subj-trailer parts).
(C) Merge threads with the same subject. For each message
under the root:
(i) Find the subject of this thread as in step 4.B.i
above.
(ii) Lookup the message associated with this extracted
subject in the table.
(iii) If the message in the table is the current message,
skip it.
Otherwise, merge the current message with the one in the
table using the following rules:
If both messages are dummies, append the current
message's children to the children of the message in
the table (the children of both messages become
siblings), and then delete the current message.
If the message in the table is a dummy and the current
message is not, make the current message a child of
the message in the table (a sibling of it's children).
If the current message is a reply or forward and the
message in the table is not, make the current message
a child of the message in the table (a sibling of it's
children).
[OPTIONAL REPLACEMENT FOR RULE ABOVE] If the current
message is a deeper reply or forward than the one in
the table (its original subject contains more
subj-refwd and "(fwd)" subj-trailer parts), make the
current message a child of the message in the table (a
sibling of it's children).
Otherwise, create a new dummy container and make both
messages children of the dummy, and replace the
message in the table with the dummy message.
(5) Traverse the messages under the root and sort each set of
siblings by date.
Example: C: A283 THREAD ORDEREDSUBJECT UTF-8 SINCE 5-MAR-2000
S: * THREAD (166)(167)(168)(169)(172)(170)(171)
(173)(174 175 176 178 181 180)(179)(177 183
182 188 184 185 186 187 189)(190)(191)(192)
(193)(194 195)(196 197 198)(199)(200 202)(201)
(203)(204)(205)(206 207)(208)
S: A283 OK THREAD completed S: A283 OK THREAD completed
C: A284 THREAD ORDEREDSUBJECT US-ASCII TEXT "gewp" C: A284 THREAD ORDEREDSUBJECT US-ASCII TEXT "gewp"
S: * THREAD
S: A284 OK THREAD completed S: A284 OK THREAD completed
C: A285 THREAD REFERENCES UTF-8 SINCE 5-MAR-2000
S: * THREAD (166)(167)(168)(169)(172)((170)(179))
(171)(173)((174)(175)(176)(178)(181)(180))
((177)(183)(182)(188 (184)(189))(185 186)(187))
(190)(191)(192)(193)((194)(195 196))(197 198)
(199)(200 202)(201)(203)(204)(205 206 207)(208)
S: A285 OK THREAD completed
Note: The line breaks in the first client response are for Note: The line breaks in the first and third client
editorial clarity and do not appear in a real THREAD responses are for editorial clarity and do not appear in
response. real THREAD responses.
Additional Responses Additional Responses
This response is an extension to the IMAP4rev1 base protocol. This response is an extension to the IMAP4rev1 base protocol.
The section heading of this response is intended to correspond with The section heading of this response is intended to correspond with
where it would be located in the main document. where it would be located in the main document.
7.2.THREAD. THREAD Response 7.2.THREAD. THREAD Response
Data: one or more threads Data: one or more threads
The THREAD response occurs as a result of a THREAD or UID THREAD The THREAD response occurs as a result of a THREAD or UID THREAD
command. It contains one or more threads. A thread consists of a command. It contains one or more threads. A thread consists of a
parenthesized list of thread members. Thread members consist of parenthesized list of thread members.
one or more message numbers until the thread splits into multiple
sub-threads, at which point the thread nests into multiple sub- Thread members consist of one or more message numbers, delimited
threads. There is no limit to the nesting of threads. by spaces, indicating successive parent and child. This continues
until the thread splits into multiple sub-threads, at which point
the thread nests into multiple sub-threads with the first member
of each subthread being siblings at this level. There is no limit
to the nesting of threads.
The messages numbers refer to those messages that match the search The messages numbers refer to those messages that match the search
criteria. For THREAD, these are message sequence numbers; for UID criteria. For THREAD, these are message sequence numbers; for UID
THREAD, these are unique identifiers. THREAD, these are unique identifiers.
Example: S: * THREAD (2)(3 6 (4 23)(44 7 96)) Example: S: * THREAD (2)(3 6 (4 23)(44 7 96))
In this example, there are two threads. The first thread The first thread consists only of message 2. The second thread
consists only of message 2. The second thread consists consists of the messages 3 (parent) and 6 (child), after which it
of the messages 3 and 6, after which it splits into two splits into two subthreads; the first of which contains messages 4
subthreads; the first of which contains messages 4 and (child of 6, sibling of 44) and 23 (child of 4), and the second of
23, and the second of which contains messages 44, 7, and which contains messages 44 (child of 6, sibling of 4), 7 (child of
96. 44), and 96 (child of 7). Since some later messages are parents
of earlier messages, the messages were probably moved from some
other mailbox at different times.
2 -- 2
,- 4 - 23
3 - 6 -< -- 3
`- 44 - 7 - 96 \-- 6
|-- 4
| \-- 23
|
\-- 44
\-- 7
\-- 96
Example: S: * THREAD ((3)(5))
In this example, 3 and 5 are siblings of a parent which does not
exist in the mailbox; however they are members of the same thread.
Formal Syntax of THREAD commands and Responses Formal Syntax of THREAD commands and Responses
thread-data = "THREAD" SPACE 1*thread-list thread-data = "THREAD" SP 1*thread-list
thread-list = "(" thread-members / thread-nested ")" thread-list = "(" thread-members / thread-nested ")"
thread-members = nz-number *(SP nz-number) [SP thread-nested] thread-members = nz-number *(SP nz-number) [SP thread-nested]
thread-nested = 2*thread-list thread-nested = 2*thread-list
thread = ["UID" SPACE] "THREAD" SP thread-algorthm thread = ["UID" SP] "THREAD" SP thread-algorthm
SP search-charset 1*(SP search-key) SP search-charset 1*(SP search-key)
thread-algorithm = "ORDEREDSUBJECT" / atom thread-algorithm = "ORDEREDSUBJECT" / "REFERENCES" / atom
The following syntax describes subject extraction rules (1)-(5): The following syntax describes subject extraction rules (2)-(6):
subject = *(subj-leader / subj-blob) subj-base *subj-trailer subject = *subj-leader [subj-middle] *subj-trailer
subj-repeat = "[" 1*DIGIT "]" subj-refwd = ("re" / ("fw" ["d"])) *WSP [subj-blob] ":"
subj-refwd = ("re" / ("fw" ["d"])) [subj-repeat] ":" subj-blob = "[" *BLOBCHAR "]" *WSP
subj-leader = subj-refwd / WSP subj-fwd = subj-fwd-hdr subject subj-fwd-trl
subj-blob = "[" 1*BLOBCHAR "]" *WSP subj-fwd-hdr = "[fwd:"
subj-fwd-trl = "]"
subj-leader = (*subj-blob subj-refwd) / WSP
subj-middle = *subj-blob (subj-base / subj-fwd)
; last subj-blob is subj-base if subj-base would
; otherwise be empty
subj-trailer = "(fwd)" / WSP subj-trailer = "(fwd)" / WSP
subj-base = NONWSP *([WSP] NONWSP) subj-base = NONWSP *([*WSP] NONWSP)
; can be a subj-blob
BLOBCHAR = %x01-5c / %x5e-7f BLOBCHAR = %x01-5a / %x5c / %x5e-7f
; any CHAR except ']' ; any CHAR except '[' and ']'
NONWSP = %x01-08 / %x0a-1f / %x21-7f NONWSP = %x01-08 / %x0a-1f / %x21-7f
; any CHAR other than WSP ; any CHAR other than WSP
Security Considerations Security Considerations
Security issues are not discussed in this memo. Security issues are not discussed in this memo.
Internationalization Considerations Internationalization Considerations
skipping to change at page 7, line 33 skipping to change at page 12, line 5
Non-English translations of "Re" or "Fw"/"Fwd" are not specified for Non-English translations of "Re" or "Fw"/"Fwd" are not specified for
removal in the extracted subject text process. By specifying that removal in the extracted subject text process. By specifying that
only the English forms of the prefixes are used, it becomes a simple only the English forms of the prefixes are used, it becomes a simple
display time task to localize the prefix language for the user. If, display time task to localize the prefix language for the user. If,
on the other hand, prefixes in multiple languages are permitted, the on the other hand, prefixes in multiple languages are permitted, the
result is a geometrically complex, and ultimately unimplementable, result is a geometrically complex, and ultimately unimplementable,
task. In order to improve the ability to support non-English display task. In order to improve the ability to support non-English display
in Internet mail clients, only the English form of these prefixes in Internet mail clients, only the English form of these prefixes
should be transmitted in Internet mail messages. should be transmitted in Internet mail messages.
A. References
[ABNF] Crocker, D., and Overell, P. "Augmented BNF for Syntax
Specifications: ABNF", RFC 2234, November 1997.
[NEWS] Horton, M., and Adams, R., "Standard for interchange of USENET
messages", RFC-1036, AT&T Bell Laboratories and Center for Seismic
Studies, December, 1987.
Author's Address Author's Address
Mark R. Crispin Mark R. Crispin
Networks and Distributed Computing Networks and Distributed Computing
University of Washington University of Washington
4545 15th Aveneue NE 4545 15th Avenue NE
Seattle, WA 98105-4527 Seattle, WA 98105-4527
Phone: (206) 543-5762 Phone: (206) 543-5762
EMail: MRC@CAC.Washington.EDU EMail: MRC@CAC.Washington.EDU
Kenneth Murchison
Oceana Matrix Ltd.
21 Princeton Place
Orchard Park, NY 14127
Phone: (716) 662-8973 x26
EMail: ken@oceana.com
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/