draft-ietf-imapext-sort-19.txt   draft-ietf-imapext-sort-20.txt 
IMAP Extensions Working Group M. Crispin IMAP Extensions Working Group M. Crispin
INTERNET-DRAFT: IMAP SORT K. Murchison Internet-Draft K. Murchison
Document: internet-drafts/draft-ietf-imapext-sort-19.txt November 2006 Intended status: Proposed Standard March 10, 2008
Expires: September 10, 2008
Document: internet-drafts/draft-ietf-imapext-sort-20.txt
INTERNET MESSAGE ACCESS PROTOCOL - SORT AND THREAD EXTENSIONS INTERNET MESSAGE ACCESS PROTOCOL - SORT AND THREAD EXTENSIONS
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that By submitting this Internet-Draft, each author represents that
any applicable patent or other IPR claims of which he or she is any applicable patent or other IPR claims of which he or she is
aware have been or will be disclosed, and any of which he or she aware have been or will be disclosed, and any of which he or she
becomes aware will be disclosed, in accordance with Section 6 of becomes aware will be disclosed, in accordance with Section 6 of
BCP 79. BCP 79.
skipping to change at line 32 skipping to change at line 33
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
A revised version of this document will be submitted to the RFC
editor as an Informational Document for the Internet Community.
A revised version of this draft document will be submitted to the RFC A revised version of this draft document will be submitted to the RFC
editor as a Proposed Standard for the Internet Community. Discussion editor as a Proposed Standard for the Internet Community. Discussion
and suggestions for improvement are requested, and should be sent to and suggestions for improvement are requested, and should be sent to
ietf-imapext@IMC.ORG. This document will expire before 20 May 2007. ietf-imapext@IMC.ORG.
Distribution of this memo is unlimited. Distribution of this memo is unlimited.
Abstract Abstract
This document describes the base-level server-based sorting and This document describes the base-level server-based sorting and
threading extensions to the [IMAP] protocol. These extensions threading extensions to the [IMAP] protocol. These extensions
provide substantial performance improvements for IMAP clients which provide substantial performance improvements for IMAP clients which
offer sorted and threaded views. offer sorted and threaded views.
1. Introduction 1. Introduction
skipping to change at line 65 skipping to change at line 64
A server which supports the base-level SORT extension indicates this A server which supports the base-level SORT extension indicates this
with a capability name which starts with "SORT". Future, with a capability name which starts with "SORT". Future,
upwards-compatible extensions to the SORT extension will all start upwards-compatible extensions to the SORT extension will all start
with "SORT", indicating support for this base level. with "SORT", indicating support for this base level.
A server which supports the THREAD extension indicates this with one A server which supports the THREAD extension indicates this with one
or more capability names consisting of "THREAD=" followed by a or more capability names consisting of "THREAD=" followed by a
supported threading algorithm name as described in this document. supported threading algorithm name as described in this document.
This provides for future upwards-compatible extensions. This provides for future upwards-compatible extensions.
A server which implements the SORT and/or THREAD extensions SHOULD A server which implements the SORT and/or THREAD extensions MUST
also implement the COMPARATOR extension as described in [IMAP-I18N]. collate strings in accordance with the requirements of I18NLEVEL=1,
as described in [IMAP-I18N], and SHOULD implement and advertise the
I18NLEVEL=1 extension. Alternatively, a server MAY implement
I18NLEVEL=2 (or higher) and comply with the rules of that level.
Discussion: the SORT and THREAD extensions predate [IMAP-I18N] by
several years. At the time of this writing, all known server
implementations of SORT and THREAD comply with the rules of
I18NLEVEL=1, but do not necessarily advertise it. As discussed
in [IMAP-I18N] section 4.5, all server implementations should
eventually be updated to comply with the I18NLEVEL=2 extension.
Historical note: the REFERENCES threading algorithm is based on the
[THREADING] algorithm written used in "Netscape Mail and News"
versions 2.0 through 3.0.
2. Terminology 2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [KEYWORDS]. document are to be interpreted as described in [KEYWORDS].
The word "can" (not "may") is used to refer to a possible The word "can" (not "may") is used to refer to a possible
circumstance or situation, as opposed to an optional facility of the circumstance or situation, as opposed to an optional facility of the
protocol. protocol.
skipping to change at line 142 skipping to change at line 155
If the time zone is invalid, the date and time SHOULD be treated as If the time zone is invalid, the date and time SHOULD be treated as
UTC. If the time is also invalid, the time SHOULD be treated as UTC. If the time is also invalid, the time SHOULD be treated as
00:00:00. If there is no valid date or time, the date and time 00:00:00. If there is no valid date or time, the date and time
SHOULD be treated as 00:00:00 on the earliest possible date. SHOULD be treated as 00:00:00 on the earliest possible date.
This differs from the date-related criteria in the SEARCH command This differs from the date-related criteria in the SEARCH command
(described in [IMAP] section 6.4.4), which use just the date and not (described in [IMAP] section 6.4.4), which use just the date and not
the time, and are not adjusted by time zone. the time, and are not adjusted by time zone.
If the sent date can not be determined (a Date: header is missing or
can not be parsed), the INTERNALDATE for that message is used as the
sent date.
When comparing two sent dates that match exactly, the order in which
the two messages appear in the mailbox (that is, by sequence number)
is used as a tie-breaker to determine the order.
3. Additional Commands 3. Additional Commands
These commands are extension to the [IMAP] base protocol. These commands are extension to the [IMAP] base protocol.
The section headings are intended to correspond with where they would The section headings are intended to correspond with where they would
be located in the main document if they were part of the base be located in the main document if they were part of the base
specification. specification.
BASE.6.4.SORT. SORT Command BASE.6.4.SORT. SORT Command
skipping to change at line 221 skipping to change at line 242
string always collates before non-empty strings. string always collates before non-empty strings.
ARRIVAL ARRIVAL
Internal date and time of the message. This differs from the Internal date and time of the message. This differs from the
ON criteria in SEARCH, which uses just the internal date. ON criteria in SEARCH, which uses just the internal date.
CC CC
[IMAP] addr-mailbox of the first "cc" address. [IMAP] addr-mailbox of the first "cc" address.
DATE DATE
Sent date and time from the Date: header, adjusted by time Sent date and time, as described in section 2.2.
zone. This differs from the SENTON criteria in SEARCH, which
uses just the date and not the time, nor adjusts by time zone.
FROM FROM
[IMAP] addr-mailbox of the first "From" address. [IMAP] addr-mailbox of the first "From" address.
REVERSE REVERSE
Followed by another sort criterion, has the effect of that Followed by another sort criterion, has the effect of that
criterion but in reverse (descending) order. criterion but in reverse (descending) order.
Note: REVERSE only reverses a single criterion, and does not Note: REVERSE only reverses a single criterion, and does not
affect the implicit "sequence number" sort criterion if all affect the implicit "sequence number" sort criterion if all
other criteria are identicial. Consequently, a sort of other criteria are identicial. Consequently, a sort of
skipping to change at line 323 skipping to change at line 342
messages with the same base subject text. Finally, the threads messages with the same base subject text. Finally, the threads
are sorted by the sent date of the first message in the thread. are sorted by the sent date of the first message in the thread.
The first message of each thread are siblings of each other The first message of each thread are siblings of each other
(the "root"). The second message of a thread is the child of (the "root"). The second message of a thread is the child of
the first message, and subsequent messages of the thread are the first message, and subsequent messages of the thread are
siblings of the second message and hence children of the siblings of the second message and hence children of the
message at the root. Hence, there are no grandchildren in message at the root. Hence, there are no grandchildren in
ORDEREDSUBJECT threading. ORDEREDSUBJECT threading.
Note: early drafts of this specification specified Children in ORDEREDSUBJECT threading do not have descendents.
that each message in an ORDEREDSUBJECT thread is a child Client implementations SHOULD treat descendents of a child in
(as opposed to a sibling) of the previous message. This a server response as being siblings of that child.
is now deprecated. For compatibility with servers which
may still use the old definition, client implementations
SHOULD treat descendents of a child as being siblings of
that child.
This is because the old definition mistakenly indicated
that there was a parent/child relationship between
successive messages in a thread; when in fact there was
only a chronological relationship. In clients which
indicate parent/child relationships in a thread tree,
this would indicate levels of descent which did not
exist.
REFERENCES REFERENCES
The REFERENCES threading algorithm is based on the [THREADING] The REFERENCES threading algorithm threads the searched
algorithm written used in "Netscape Mail and News" versions 2.0 messages by grouping them together in parent/child
through 3.0. This algorithm threads the searched messages by relationships based on which messages are replies to others.
grouping them together in parent/child relationships based on The parent/child relationships are built using two methods:
which messages are replies to others. The parent/child reconstructing a message's ancestry using the references
relationships are built using two methods: reconstructing a contained within it; and checking the original (not base)
message's ancestry using the references contained within it; subject of a message to see if it is a reply to (or forward of)
and checking the original (not base) subject of a message to another message.
see if it is a reply to (or forward of) another message.
Note: "Message ID" in the following description refers to a Note: "Message ID" in the following description refers to a
normalized form of the msg-id in [RFC-2822]. The actual normalized form of the msg-id in [RFC-2822]. The actual
text in an RFC 2822 may use quoting, resulting in multiple text in an RFC 2822 may use quoting, resulting in multiple
ways of expressing the same Message ID. Implementations of ways of expressing the same Message ID. Implementations of
the REFERENCES threading algorithm MUST normalize any msg-id the REFERENCES threading algorithm MUST normalize any msg-id
in order to avoid false non-matches due to differences in in order to avoid false non-matches due to differences in
quoting. quoting.
For example, the msg-id For example, the msg-id
skipping to change at line 469 skipping to change at line 475
If it is a dummy message with NO children, delete it. If it is a dummy message with NO children, delete it.
If it is a dummy message with children, delete it, but If it is a dummy message with children, delete it, but
promote its children to the current level. In other words, promote its children to the current level. In other words,
splice them in with the dummy's siblings. splice them in with the dummy's siblings.
Do not promote the children if doing so would make them Do not promote the children if doing so would make them
children of the root, unless there is only one child. children of the root, unless there is only one child.
(4) Sort the messages under the root (top-level siblings only) (4) Sort the messages under the root (top-level siblings only)
by sent date. In the case of an exact match on sent date, use by sent date as described in section 2.2. In the case of a
the order in which the messages appear in the mailbox (that is,
by sequence number) to determine the order. In the case of a
dummy message, sort its children by sent date and then use the dummy message, sort its children by sent date and then use the
first child for the top-level sort. If the sent date can not first child for the top-level sort.
be determined (a Date: header is missing or can not be parsed),
the INTERNALDATE for that message is used as the sent date.
(5) Gather together messages under the root that have the same (5) Gather together messages under the root that have the same
base subject text. base subject text.
(A) Create a table for associating base subjects with (A) Create a table for associating base subjects with
messages, called the subject table. messages, called the subject table.
(B) Populate the subject table with one message per each (B) Populate the subject table with one message per each
base subject. For each child of the root: base subject. For each child of the root:
skipping to change at line 553 skipping to change at line 555
Otherwise, create a new dummy message and make both Otherwise, create a new dummy message and make both
the current message and the message in the subject the current message and the message in the subject
table children of the dummy. Then replace the message table children of the dummy. Then replace the message
in the subject table with the dummy message. in the subject table with the dummy message.
Note: Subject comparisons are case-insensitive, as Note: Subject comparisons are case-insensitive, as
described under "Internationalization described under "Internationalization
Considerations." Considerations."
(6) Traverse the messages under the root and sort each set of (6) Traverse the messages under the root and sort each set of
siblings by sent date. Traverse the messages in such a way siblings by sent date as described in section 2.2. Traverse
that the "youngest" set of siblings are sorted first, and the the messages in such a way that the "youngest" set of siblings
"oldest" set of siblings are sorted last (grandchildren are are sorted first, and the "oldest" set of siblings are sorted
sorted before children, etc). In the case of an exact match on last (grandchildren are sorted before children, etc). In the
sent date or if either of the Date: headers used in a case of a dummy message (which can only occur with top-level
comparison can not be parsed, use the order in which the siblings), use its first child for sorting.
messages appear in the mailbox (that is, by sequence number) to
determine the order. In the case of a dummy message (which can
only occur with top-level siblings), use its first child for
sorting.
Example: C: A283 THREAD ORDEREDSUBJECT UTF-8 SINCE 5-MAR-2000 Example: C: A283 THREAD ORDEREDSUBJECT UTF-8 SINCE 5-MAR-2000
S: * THREAD (166)(167)(168)(169)(172)(170)(171) S: * THREAD (166)(167)(168)(169)(172)(170)(171)
(173)(174 (175)(176)(178)(181)(180))(179)(177 (173)(174 (175)(176)(178)(181)(180))(179)(177
(183)(182)(188)(184)(185)(186)(187)(189))(190) (183)(182)(188)(184)(185)(186)(187)(189))(190)
(191)(192)(193)(194 195)(196 (197)(198))(199) (191)(192)(193)(194 195)(196 (197)(198))(199)
(200 202)(201)(203)(204)(205)(206 207)(208) (200 202)(201)(203)(204)(205)(206 207)(208)
S: A283 OK THREAD completed S: A283 OK THREAD completed
C: A284 THREAD ORDEREDSUBJECT US-ASCII TEXT "gewp" C: A284 THREAD ORDEREDSUBJECT US-ASCII TEXT "gewp"
S: * THREAD S: * THREAD
skipping to change at line 729 skipping to change at line 727
6. Security Considerations 6. Security Considerations
The SORT and THREAD extensions do not raise any security The SORT and THREAD extensions do not raise any security
considerations that are not present in the base [IMAP] protocol, and considerations that are not present in the base [IMAP] protocol, and
these issues are discussed in [IMAP]. Nevertheless, it is important these issues are discussed in [IMAP]. Nevertheless, it is important
to remember that [IMAP] protocol transactions, including message to remember that [IMAP] protocol transactions, including message
data, are sent in the clear over the network unless protection from data, are sent in the clear over the network unless protection from
snooping is negotiated, either by the use of STARTTLS, privacy snooping is negotiated, either by the use of STARTTLS, privacy
protection is negotiated in the AUTHENTICATE command, or some other protection is negotiated in the AUTHENTICATE command, or some other
protection mechanism is in effect. protection mechanism.
Although not a security consideration, it is important to recognize
that sorting by REFERENCES can lead to misleading threading trees.
For example, a message with false References: header data will cause
a thread to be incorporated into another thread.
The process of extracting the base subject may lead to incorrect
collation if the extracted data was significant text as opposed to
a subject artifact.
7. Internationalization Considerations 7. Internationalization Considerations
As described in [IMAP-I18N], strings in charsets other than US-ASCII As stated in the introduction, the rules of I18NLEVEL=1 as described
and UTF-8 MUST be converted to UTF-8 and compared in ascending order in [IMAP-I18N] MUST be followed; that is, the SORT and THREAD
according to the selected or active collation algorithm. If the extensions MUST collate strings according to the i;unicode-casemap
server does not support the [IMAP-I18N] COMPARATOR extension, the collation described in [UNICASEMAP]. Servers SHOULD also advertise
collation algorithm used is the "en;ascii-casemap" collation the I18NLEVEL=1 extension. Alternatively, a server MAY implement
described in [COMPARATOR]. I18NLEVEL=2 (or higher) and comply with the rules of that level.
As discussed in [IMAP-I18N] section 4.5, all server implementations
should eventually be updated to support the [IMAP-I18N] I18NLEVEL=2
extension.
Translations of the "re" or "fw"/"fwd" tokens are not specified for Translations of the "re" or "fw"/"fwd" tokens are not specified for
removal in the base subject extraction process. An attempt to add removal in the base subject extraction process. An attempt to add
such translated tokens would result in a geometrically complex, and such translated tokens would result in a geometrically complex, and
ultimately unimplementable, task. ultimately unimplementable, task.
Instead, note that [RFC-2822] section 3.6.5 recommends that "re:" Instead, note that [RFC-2822] section 3.6.5 recommends that "re:"
(from the Latin "res", in the matter of) be used to identify a reply. (from the Latin "res", in the matter of) be used to identify a reply.
Although it is evident that, from the multiple forms of token to Although it is evident that, from the multiple forms of token to
identify a forwarded message, there is considerable variation found identify a forwarded message, there is considerable variation found
skipping to change at line 773 skipping to change at line 784
which registers threading algorithms by publishing a standards track which registers threading algorithms by publishing a standards track
or IESG approved experimental RFC. This document constitutes or IESG approved experimental RFC. This document constitutes
registration of the ORDEREDSUBJECT and REFERENCES algorithms in that registration of the ORDEREDSUBJECT and REFERENCES algorithms in that
registry. registry.
9. Normative References 9. Normative References
The following documents are normative to this document: The following documents are normative to this document:
[ABNF] Crocker, D. and Overell, P. "Augmented BNF [ABNF] Crocker, D. and Overell, P. "Augmented BNF
for Syntax Specifications: ABNF", RFC 4234 for Syntax Specifications: ABNF", RFC 5234
October 2005. January 2008
[CHARSET] Freed, N. and J. Postel, "IANA Character Set [CHARSET] Freed, N. and Postel, J. "IANA Character Set
Registration Procedures", RFC 2978, October Registration Procedures", RFC 2978, October
2000. 2000.
[COMPARATOR] Newman, C. "Internet Appplication Protocol
Collation Registry", Work in Progress.
[IMAP] Crispin, M. "Internet Message Access Protocol - [IMAP] Crispin, M. "Internet Message Access Protocol -
Version 4rev1", RFC 3501, March 2003. Version 4rev1", RFC 3501, March 2003.
[IMAP-I18N] Newman, C. "Internet Message Access Protocol [IMAP-I18N] Newman, C. and Gulbrandsen, A. "Internet
Internationalization", Work in Progress. Message Access Protocol Internationalization",
Work in Progress.
[KEYWORDS] Bradner, S. "Key words for use in RFCs to [KEYWORDS] Bradner, S. "Key words for use in RFCs to
Indicate Requirement Levels", BCP 14, RFC 2119, Indicate Requirement Levels", BCP 14, RFC 2119,
March 1997. March 1997.
[RFC-2822] Resnick, P. "Internet Message Format", RFC [RFC-2822] Resnick, P. "Internet Message Format", RFC
2822, April 2001. 2822, April 2001.
[UNICASEMAP] Crispin, M. "i;unicode-casemap - Simple Unicode
Collation Algorithm", RFC 5051.
10. Informative References 10. Informative References
The following documents are informative to this document: The following documents are informative to this document:
[IMAP-MODELS] Crispin, M. "Distributed Electronic Mail Models [IMAP-MODELS] Crispin, M. "Distributed Electronic Mail Models
in IMAP4", RFC 1733, December 1994. in IMAP4", RFC 1733, December 1994.
[THREADING] Zawinski, J. "Message Threading", [THREADING] Zawinski, J. "Message Threading",
http://www.jwz.org/doc/threading.html, http://www.jwz.org/doc/threading.html,
1997-2002. 1997-2002.
skipping to change at line 832 skipping to change at line 844
Carnegie Mellon University Carnegie Mellon University
5000 Forbes Avenue 5000 Forbes Avenue
Cyert Hall 285 Cyert Hall 285
Pittsburgh, PA 15213 Pittsburgh, PA 15213
Phone: +1 (412) 268-2638 Phone: +1 (412) 268-2638
Email: murch@andrew.cmu.edu Email: murch@andrew.cmu.edu
Full Copyright Statement Full Copyright Statement
Copyright (C) The Internet Society (2006). Copyright (C) The IETF Trust (2008).
This document is subject to the rights, licenses and restrictions This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors contained in BCP 78, and except as set forth therein, the authors
retain all their rights. retain all their rights.
This document and the information contained herein are provided on an This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property Intellectual Property
The IETF takes no position regarding the validity or scope of any The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information made any independent effort to identify any such rights. Information
 End of changes. 21 change blocks. 
72 lines changed or deleted 84 lines changed or added

This html diff was produced by rfcdiff 1.34. The latest version is available from http://tools.ietf.org/tools/rfcdiff/