[Docs] [txt|pdf] [Tracker] [WG] [Email] [Diff1] [Diff2] [Nits]

Versions: 00 01 02 03 RFC 6203

Message Organization Working Group                           T. Sirainen
Internet-Draft                                              May 24, 2009
Intended status: Standards Track
Expires: November 25, 2009


                    IMAP4 Extension for Fuzzy Search
                    draft-ietf-morg-fuzzy-search-01

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on November 25, 2009.

Copyright Notice

   Copyright (c) 2009 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents in effect on the date of
   publication of this document (http://trustee.ietf.org/license-info).
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.

Abstract

   This document describes an IMAP protocol extension enabling server to
   perform searches with inexact matching and assigning relevancy scores
   for matched messages.



Sirainen                Expires November 25, 2009               [Page 1]

Internet-Draft             IMAP4 FUZZY SEARCH                   May 2009


Note

   A revised version of this draft document will be submitted to the RFC
   editor as a Proposed Standard for the Internet Community.  Discussion
   and suggestions for improvement are requested, and should be sent to
   morg@ietf.org.


1.  Conventions used in this document

   In examples, "C:" indicates lines sent by a client that is connected
   to a server.  "S:" indicates lines sent by the server to the client.

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [Kwds].


2.  Introduction

   When humans perform searches in IMAP clients, they typically want to
   see the most relevant search results first.  IMAP servers are able to
   do this in the most efficient way when they're free to internally
   decide how searches should match messages.  This document describes a
   new SEARCH=FUZZY extension that provides such functionality.


3.  The FUZZY Search Key

   FUZZY search key takes another search key as its argument.  Server is
   allowed to perform all matching in an implementation-defined manner
   for this search key.  Typically this would be used to search for
   strings, for example:

   C: A01 SEARCH FUZZY (SUBJECT "IMAP break")
   S: * SEARCH 1 5 10
   S: A01 OK Search completed.

   Besides matching messages with subject "IMAP break", the above search
   may also match messages with subjects "broken IMAP", "IMAP is
   broken", or anything else the server decides that might be a good
   match.


4.  Relevancy Scores for Search Results

   Servers SHOULD assign a search relevancy score for each matched
   message when the FUZZY search key is given.  Relevancy scores are



Sirainen                Expires November 25, 2009               [Page 2]

Internet-Draft             IMAP4 FUZZY SEARCH                   May 2009


   given in range 1-100, where 100 is the highest relevancy.  The
   relevancy scores SHOULD use the full 1-100 range, so that clients can
   show them to users in a meaningful way, such as a percentage value.

   As the name already tells, relevancy scores specify how relevant to
   the search the matched message is.  It's not necessarily the same as
   how precisely the message matched.  For example a message whose
   subject matches fuzzily the search string might get a higher
   relevancy score than a message whose body had the exact string in the
   middle of a sentence.

   If server advertises the ESEARCH capability as defined by [ESEARCH],
   the relevancy scores can be retrieved using the new RELEVANCY return
   option for SEARCH:

   C: A02 SEARCH RETURN (RELEVANCY ALL) FUZZY TEXT "Helo"
   S: * ESEARCH (TAG "A02") ALL 1,5,10 RELEVANCY (4 99 42)
   S: A02 OK Search completed.

   The RELEVANCY return option MUST NOT be used unless FUZZY search key
   is also given.


5.  Fuzzy matching with non-string search keys

   Fuzzy matching is not limited to just string matching.  All search
   keys SHOULD be matched fuzzily, although what exactly that means for
   different search keys is left up to implementations to decide.  Some
   suggestions are given below.

   Dates: A typical example could be when a user wants to find a message
   "from Dave about a week ago".  A client could perform this search
   using SEARCH FUZZY (FROM "Dave" SINCE 21-Jan-2009 BEFORE 24-Jan-
   2009).  Server could return messages outside the specified date
   range, but the further away the message is, the lower the relevancy
   score.

   Sizes: These should be handled similar to dates.  If a user wants to
   search for "about 1 MB attachments", the client could do this by
   sending SEARCH FUZZY (LARGER 900000 SMALLER 1100000).  Again the
   further away the message size is from the specified range, the lower
   the relevancy score.

   Flags: Server could return messages that don't have the specified
   flags, but with a lower relevancy score.

   UIDs, sequences, modification sequences: [[anchor6: TODO: There is
   probably not much point in matching these fuzzily.  Could it be a



Sirainen                Expires November 25, 2009               [Page 3]

Internet-Draft             IMAP4 FUZZY SEARCH                   May 2009


   MUST NOT for clients to use them inside FUZZY?]]


6.  Extensions to SORT

   If server advertises the SORT capability as defined by [SORT], the
   results can be sorted by the new RELEVANCY sort criteria:

   C: A03 SORT (RELEVANCY) UTF-8 FUZZY SUBJECT "Helo"
   S: * SORT 5 10 1
   S: A03 OK Sort completed.

   The message with the highest score is returned first.  As with
   RELEVANCY return option, RELEVANCY sort criteria MUST NOT be used
   unless FUZZY search key is also given.

   If server advertises the ESORT capability as defined by [CONTEXT],
   the relevancy scores can be retrieved using the new RELEVANCY return
   option for SORT:

   C: A04 SORT RETURN (RELEVANCY ALL) (RELEVANCY) FUZZY TEXT "Helo"
   S: * ESEARCH (TAG "A04") ALL 5,10,1 RELEVANCY (99 42 4)
   S: A04 OK Sort completed.


7.  Formal Syntax

   The following syntax specification uses the augmented Backus-Naur
   Form (BNF) as described in [ABNF].  It includes definitions from
   [RFC3501], [IMAP-ABNF] and [SORT].

       capability         =/ "SEARCH=FUZZY"

       score              = 1*3DIGIT
          ;; (1 <= n <= 100)

       score-list         = "(" [score *(SP score)] ")"

       search-key         =/ "FUZZY" SP search-key

       search-return-data =/ "RELEVANCY" SP score-list
          ;; Conforms to <search-return-data>, from [IMAP-ABNF]

       search-return-opt  =/ "RELEVANCY"
          ;; Conforms to <search-return-opt>, from [IMAP-ABNF]

       sort-key           =/ "RELEVANCY"




Sirainen                Expires November 25, 2009               [Page 4]

Internet-Draft             IMAP4 FUZZY SEARCH                   May 2009


8.  Security Considerations

   This document is believed not to have any security implications.


9.  IANA Considerations

   IMAP4 capabilities are registered by publishing a standards track or
   IESG approved experimental RFC.  The registry is currently located
   at:


      http://www.iana.org/assignments/imap4-capabilities


   This document defines the X-DRAFT-I00-SEARCH=FUZZY [[anchor8: Note to
   RFC Editor: fix before publication]] IMAP capability.  IANA is
   requested to add it to the registry.


10.  Acknowledgements

   Alexey Melnikov has helped with this document.


11.  Normative References

   [ABNF]     Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax
              Specifications: ABNF", RFC 5234, January 2008.

   [CONTEXT]  Cridland, D. and C. King, "Contexts for IMAP4", RFC 5267,
              July 2008.

   [ESEARCH]  Melnikov, A. and D. Cridland, "IMAP4 Extension to SEARCH
              Command for Controlling What Kind of Information Is
              Returned", RFC 4731, November 2006.

   [IMAP-ABNF]
              Melnikov, A. and C. Daboo, "Collected Extensions to IMAP4
              ABNF", RFC 4466, April 2006.

   [Kwds]     Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", RFC 2119, March 1997.

   [RFC3501]  Crispin, M., "INTERNET MESSAGE ACCESS PROTOCOL - VERSION
              4rev1", RFC 3501, March 2003.

   [SORT]     Crispin, M. and K. Murchison, "Internet Message Access



Sirainen                Expires November 25, 2009               [Page 5]

Internet-Draft             IMAP4 FUZZY SEARCH                   May 2009


              Protocol - SORT and THREAD Extensions", RFC 5256,
              June 2008.


Author's Address

   Timo Sirainen

   Email: tss@iki.fi










































Sirainen                Expires November 25, 2009               [Page 6]


Html markup produced by rfcmarkup 1.108, available from http://tools.ietf.org/tools/rfcmarkup/