Internet and e-mail policy and practice
including Notes on Internet E-mail


Click the comments link on any story to see comments or add your own.

Subscribe to this blog

RSS feed

Home :: Email

08 Oct 2014

How can we do spam filtering on mail we can't read? Email

For reasons that should be obvious, a lot of people are thinking about ways to make e-mail more secure, and harder to spy on. The most likely scenario is an improved version of PGP or S/MIME, two existing encrypted mail systems, that let people publish their encryption key, which correspondents use to encrypt mail so that only the recipient can read it. While this is a significant improvement in privacy, it has the problem that spam filters at the ISP can't read the mail either.

Modern spam filters use a combination of techniques, some of which are still usable without looking at the contents of the messages, but many of which are not.

Identity tools

Filtering by source IP reputation is one of the oldest and still one of the most effective techniques. Some IP reputation comes from external sources (Spamhaus is the usual example), while some is local to a mail system, based on their own experience. When an IP's reputation is poor enough, a mail system can reject mail sight unseen, with little chance of a mistake. If the mail still arrives by SMTP, which seems a fairly safe assumption for now, filtering by IP still works.

Under most scenarios, message authentication still works with encrypted mail. SPF checks work the same as with any other kind of mail, since they're based on the IP address from which the message was sent and the envelope bounce address on the message. DKIM signatures also work, since an outgoing mail system can add a DKIM signature to any message, encrypted or otherwise. PGP and S/MIME also allow the sender to sign the body of the message with a signature tied to their e-mail address, providing another identity to use for filtering. These are all useful for whitelisting mail from known senders with good reputations, but that still leaves out a lot of mail, notably mail from most public webmail providers and any other provider to have a mix of good or bad customers. Authentication can also be harder if it's encrypted in a way that obscures the identity of the sender.

Session tools

Some filtering techniques use characteristics of the mail delivery session, since mail sent by malware often has defects or peculiarities different from mail sent by real mail software. These should still work. Indeed, if the malware encrypts the mail to try to prevent body filtering, it's fairly likely that the encrypted version of the mail will have its own markers that session tools can recognize.

Body filtering

Despite all that, there's still no substitute for looking at the body of the message to see how spammy it looks. But if the message is encrypted, the body isn't available until the end user's mail program decrypts it. One possibility is to do mail filtering in the mail program. This isn't a new idea; many desktop programs like Thunderbird and Outlook have done this for a long time. But I don't think it's likely to be very successful. One reason is that good filtering requires constant updates to the filtering rules. In principle the mail provider could download updated filtering rules to the user's mail program, in reality that's a lot of extra traffic, and more importantly it exposes the filtering rules to the users, who are not necessarily friendly. (Spammers sign up for accounts and send themselves test messages all the time, to see if they get through the filters now.)

Also, mail programs increasingly don't run on PCs--they run on tablets and smartphones. Downloading the mail, determining that a lot of it is spam, and putting the spam back in a spam folder is a lot of network traffic and a lot of computing for the limited computing power of a handheld device.

I expect that what will really happen is that the minority of users who are serious about their mail security will do it on their PCs, or perhaps will arrange for some sort of cryptographically encapsulated process in the cloud from which only they can extract the filtered mail. Everyone else will give their private keys to their webmail provider and let them do the filtering, on the (rather dubious) theory that the mail provider is trustworthy.

posted at: 23:36 :: permanent link to this entry :: 0 comments
posted at: 23:36 ::
permanent link to this entry :: 0 comments

comments...        (Jump to the end to add your own comment)

Add your comment...

Note: all comments require an email address to send a confirmation to verify that it was posted by a person and not a spambot. The comment won't be visible until you click the link in the confirmation. Unless you check the box below, which almost nobody does, your email won't be displayed, and I won't use it for other purposes.

Email: you@wherever (required, for confirmation)
Title: (optional)
Show my Email address
Save my Name and Email for next time


My other sites

Who is this guy?

Airline ticket info

Taughannock Networks

Other blogs

Remembering JD Falk - 10 years later
223 days ago

A keen grasp of the obvious
New Hope for the Dead
465 days ago

Related sites

Coalition Against Unsolicited Commercial E-mail

Network Abuse Clearinghouse

© 2005-2020 John R. Levine.
CAN SPAM address harvesting notice: the operator of this website will not give, sell, or otherwise transfer addresses maintained by this website to any other party for the purposes of initiating, or enabling others to initiate, electronic mail messages.