Recently I moved my eMail from an old server I was managing myself to a cheap but nice shared web host.
The new host unfortunately does not cope well with the 50k+ eMails in my inbox - search does not work. At all. I need search. Searching never was great before - in Roundcube, Thunderbird, mutt. I couldn’t get the hang of notmuch.
For a long time I wanted something better all around. Search with boolean terms, resilient against typos, featuring stemming, maybe even coping with synonyms.
I wanted to use this as a learning opportunity and also to try out some technology I was longing to play with since quite a while: Clojure and Lucene.
Fiddling around for half a day and here I present my prototype: Fast and feature-rich fulltext IMAP mail search in a couple of SLOC. Yeah, I know, it’s more than six lines like the title implies, but I like to count only lines that actually do something useful.
Indexing my IMAP inbox
Using a lein repl
in the project directory I can index my inbox like this.
Syntax and coding style might not be what a seasoned Clojure dev would like to see… Please bear with me here, this is the first Clojure I have written, ever.
|
|
This creates a buffered Channel
(very similar to unix pipes) and connects the IMAP client to one end, the search engine indexer to the other end.
Both the IMAP client and the indexer run in their own threads, the channel cares for the synchronization.
Voilà ! I always loved Hoare’s CSP model.
Querying
The first example is basically hello world:
|
|
This works, but we can do much nicer than that. Also, the query string can be anything Lucene accepts, which is much more than a simple example above. Here’s what I used after playing with my eMail index for a while:
|
|
The next steps in my eyes is to add a daemon that index eMails as they fly in (using IMAP IDLE) and an API for querying.
Follow the progress or download the whole project on GitHub.