A French-based Englishman, John Graham-Cumming, is about 666,666 clicks away from creating a weapon to kill spam for good.
Graham-Cumming, who lives in Toulouse, is the seasoned spam fighter who wrote Popfile, an open-source email classification tool. He also wrote Polymail, an antispam library licensed by other companies for use in spam filters.
Spam still comprises about 80 percent of all email, although it has become less of an annoyance due to much-improved filtering. But spammers persevere, finding ways of slipping email through, and the race continues to develop sharper filters.
"I don't think spam is going to go away," Graham-Cumming said. "Clearly spammers are still making money or they wouldn't be sending lots of spam."
Graham-Cumming's new project asks people to donate their time to classify a 'corpus' of 100,000 email messages used to test the accuracy of spam filters. He's set up a site where people can randomly sort messages as either 'spam' or 'ham', which is good email.
The email messages comprise the TREC (Text Retrieval Conference) 2005 Public Spam Corpus, affiliated with the US NIST (National Institute of Standards and Technology).
An unlikely major donor of the email was Enron, the US energy company whose errant accounting practices led to bankruptcy in 2001. The email of dozens of Enron employees was subpoenaed and eventually released to the public.
The Enron email messages are a hot commodity for spam research – a rich trove of private email and spam that's hard to come by, Graham-Cumming said.
The idea is for each email to be classified 10 times for a majority consensus. So far, the project is about one-third done.
Most messages are easy to classify to anyone vaguely familiar with email. But overall, machines and people disagree about one out of every 10 times, Graham-Cumming said.
Not surprisingly, phishing email messages, which often look quite legitimate but dupe people into divulging personal details, are hardest for people to distinguish, Graham-Cumming said.
The research could be used to publish an updated corpus, one that more precisely classifies what is spam and what is ham, Graham-Cumming said. It may also lend new knowledge about phishing, which continues to flourish despite better awareness.
"I'd be very interested in discovering if there are certain sorts of legitimate mail that always gets filtered," Graham-Cumming said.
Those who participate in classifying messages have a chance to win a suite of Austin Powers movie trinkets, including an 'Enlarger'.
What's an enlarger? Check your junk mail box.