This, uh, fell off a truck. I cannot vouch for its authenticity.
> Who wants to answer this one?
Oh, what the heck, tell him about it.
---- snip ----
Our spamtraps are pretty straightforward. We consider a significant
ISP to be one with a thousand or more mailboxes, so for fairness we
have spamtraps at all of them. We also have honeypots, old user
accounts, and a couple of more kinds we can't discuss due to pending
patent applications. (By the way, did you know the third largest taxi
fleet in NYC is run by the NYPD?)
As the spam comes in to the network of spamtraps, it's analyzed in
real time. We used to do it in software with pattern matching and
scoring and OCR of images, but it turns out to be faster and cheaper
to have people do it, using systems similar to Amazon's Mechanical
Turk. We've found that people can classify 20 to 30 messages per
minute, more if they happen to speak the language the message is
written in. We used to give each message to multiple people to
cross-check the results, but that slowed things down so we don't any
Once we have the spam classified by source, topic, URL, and product,
then we do a comparison against normalized traffic for the Internet as
a whole. In cooperation with several Tier I NSPs we do a whole-net
survey three times a year, sampling traffic through their backbone
routers to get a baseline level of spam and malware. Then we look for
significant deviations from the baseline to decide what to list in our
proprietary blacklists. So if, for example, the baseline traffic
level is 0.0001% spam, and we see spamtrap traffic from an IP at the
0.001% level, that's an order of magnitude up which makes it a slam
dunk to list.
Any more questions, feel free to ask.