Internet and e-mail policy and practice
including Notes on Internet E-mail


Click the comments link on any story to see comments or add your own.

Subscribe to this blog

RSS feed

Home :: Email

03 Jun 2008

Fourteen thousand messages Email

A guy I know went away on a trip for a month and a half. When he got back, his inbox had 14,000 messages waiting for him, real ones, since his mail system has pretty good spam filtering. How can anyone deal with that much mail? More importantly, if there are tools to sort, filter, combine, and so forth to get the mail under control, how can people who aren't technoweenies like me manage the tools?


The standard way to manage large amounts of mail is threading, collecting related messages into threads. This has the dual advantage that you can read related messages in sequence, which is faster than having to remember for each message in your mailbox what the discussion was about, and if a thread is boring, you can ignore the whole thing with a keystroke.

Threading has been around forever, going back at least to the 1980s when it appeared in usenet newsreaders. (Usenet users had to deal with vast numbers of messages from the beginning, since in principle every user can see every message.)

Multiple inboxes

A variant on threading is multiple inboxes, for people whose mail programs can handle them, which is most of them. I subscribe to a lot of mailing lists, and set up my mail so that most lists have their own inboxes, so I can read the lists when I get around to it without cluttering my main inbox. I also gateway a lot of lists into local usenet newsgroups on my local news server, which lets me take advantage of the superior intra-group threading in news readers, and automatic expiration that will get rid of messages after a while whether I've read them or not.

Mail that's really something else

A large fraction of the 14,000 messages weren't from people, but were really automated status reports from a variety of systems. These may look like mail messages, but really they're not, they're just disguised as mail because that's the easiest way to send messages across a network and queue them until the recipient handles them. These are really transactions, and for the most part, there's no reason that anyone needs to look at them all. It'd be a lot more useful if they could be intercepted on the way into the inbox so that only the most recent one of each category was visible, or collected statistics were available in a sub-window, or the like. Sending transactions by mail is a perfectly reasonable thing to do. What's unreasonable is dealing with them by hand, when computers can read them for you, extract the interesting bits, and do something better than show you the text.

This is not a new observation. I sketched out a model of automatic transaction mail handling in the early 1980s and I doubt it was new then.

Setting up mail management

The hard part about all this swell mail management, for normal people at least, is setting up all the sorting and filtering. Mail threading happens automatically in most modern mail programs, but the sorting has to be set up manually.

The standard filtering tool on Unix and Linux systems is a package called procmail, which in the classic Unix tradition combines great power and flexibility with an input syntax that is terse to the point of impenetrability. (I've been using procmail for a decade, and I still need to look at the manual when I add new configuration rules.) Once your procmail configuration files are set up, it can sort mail as it's delivered any way you want, but setting up the configuration is hard for two reasons. One is the awful procmail syntax, somewhat ameliorated by cut-and-paste examples available online, but the other is to figure out how to describe your filtering plans in terms that procmail can understand.

Sorting mail from mailing lists is relatively easy, since all the mail from a given list tends to contain fixed strings one can sort by. Discussion lists now usually contain a List-ID: header, which is there specifically to help sorting, and newsletters and other broadcast mail tends to use a fixed From: address, which they tell you to put into your address book to deter spam filters. Although I haven't seen tools to do this, it wouldn't be hard to add a button to a mail program that says sort mail like this into a separate mailbox. Even without the button, most mail programs and web mail offer their own sorting rules that can match fixed strings and put mail into folders.

Once the categories get a little fuzzier, the sorting gets harder. I have a separate catchall mailbox for all of the newsletters that aren't high enough volume or high enough interest to merit their own mailbox. There's no procmail pattern that will match all the list mail, so I've ended up having to list every address to which my list mail is sent (over 180 of them, since I give each one a separate sub-address). That works, it's simple computer programming, but it's not exactly E-Z maintenance.

For the transaction mail, procmail picks out the mail, or it's sent to unique sub-addresses, but then I had to write custom shell or perl scripts that pick out the useful info, update databases, or whatever. This is fine if you're a programmer or have a programmer who can set it up for you, hopeless otherwise.

Mail management for normal people

I've been thinking about how people who aren't inclined to maintain their own software can manage high volume mail. It's an interesting problem, more of concepts than of code, for which I see two overlapping approaches. One is the general idea of "filter mail like this", show it a handful of messages, and it intuits a sorting rule to match those messages. The other wanders into the murky realm of artificial intelligence and expert systems, software that tries to encapsulate the rules that human experts use to analyze situations, applied to mail filtering.

If you asked a handful of nontechnical mail users what they'd like to help manage mail better, other than obvious stuff like don't deliver any spam, I doubt you'd get very consistent answers beyond generally wanting to see the important stuff and not wanting to see the unimportant stuff. Human experts are pretty good at sorting messages by importance. (We used to call these experts "secretaries.") Could we capture the importance sorting skills of a good secretary in an expert system? I don't know, but with suitable grant or angel funding, I'd be thrilled to give it a try.

posted at: 13:33 :: permanent link to this entry :: 2 comments
posted at: 13:33 :: permanent link to this entry :: 2 comments

comments...        (Jump to the end to add your own comment)

procmail => maildrop
For just the reason given, i.e., opaque syntax, I now use Maildrop. It's from the same team as the Courier MTA (SMTP server) and works well.

It may be a bug or a feature, but if the .maildroprc is not syntactically valid, Maildrop refuses to deliver the message. The problem will show up in syslog. Maybe Maildrop needs something like visudo that edits the configuration file, then verifies it before actually replacing it. Note: procmail always delivers it, somewhere (can be a challenge to find where).

(by Jeffrey L. Taylor 02 Jun 2008 08:54)

I looked at Maildrop a while ago, and it had severe portability problems if you weren't running on Linux. (Much worse than other parts of Courier, the POP, IMAP, and web mail work fine.)

I agree that its input syntax is somewhat less obscure than procmail, but it still suffers from the problem that you need to think like a programmer to write the config files.

(by John Levine 02 Jun 2008 12:31)

Add your comment...

Note: all comments require an email address to send a confirmation to verify that it was posted by a person and not a spambot. The comment won't be visible until you click the link in the confirmation. Unless you check the box below, which almost nobody does, your email won't be displayed, and I won't use it for other purposes.

Email: you@wherever (required, for confirmation)
Title: (optional)
Show my Email address
Save my Name and Email for next time


My other sites

Who is this guy?

Airline ticket info

Taughannock Networks

Other blogs

Remembering JD Falk - 10 years later
181 days ago

A keen grasp of the obvious
New Hope for the Dead
423 days ago

Related sites

Coalition Against Unsolicited Commercial E-mail

Network Abuse Clearinghouse

© 2005-2020 John R. Levine.
CAN SPAM address harvesting notice: the operator of this website will not give, sell, or otherwise transfer addresses maintained by this website to any other party for the purposes of initiating, or enabling others to initiate, electronic mail messages.