Researchers at Microsoft have released a new report and tool aimed at preventing web spammers from exploiting internet search engines to drive traffic to spam URLs.
The tool, called the Strider Search Defender, identifies spam URLs that are being distributed through social networking, forum and blog-hosting websites, and can prevent those URLs from being indexed by search engines, said Yi-Min Wang, group manager of the Cybersecurity and Systems Management Research Group in Microsoft Research.
Instead of commenting on user pages of popular forums and blog sites - such as Google BlogSpot or MySpace - spammers will send URLs that link to spam websites to as many internet forum pages as they can, he said. Since these URLs appear so frequently on valid websites, search engines such as Google, Yahoo and Microsoft's own MSN will index them and they will begin appearing in search results, Wang said.
"They create a URL they want people to click and they put that into every possible open forum and guest book they can," he said. "Some search engines will see that this URL is everywhere on the web so [they think] it should be popular. But it doesn't have the kind of relevance to be in the top search-engine results."
The tool uses elements of technology previously developed in Microsoft Research in projects called Strider, HoneyMonkey and Typo Patrol to search forums that have been spammed and to identify spam URLs in the hope of removing them before they are indexed by search engines. It also has an element that can distinguish between legitimate URLs on web forums and spam URLs, Wang said.
In cases when a spammer uses what is called a 'doorway domain' to set up a spam site, the tool can identify the domain that is being exploited and notify its administrators, he said. A doorway domain is a legitimate URL, such as www.blogger.com, that spammers use to set up a spam site so it looks like a valid website, and thus will fool users and search engines.
"If they put [what looks like a] blog URL into your forum and everyone else’s, they will fool the search engine," Wang said.
In addition to specifications for the tool, Microsoft Research published information in its report to encourage owners of free web-hosting sites, search engines and publicly accessible web forums to do what they can to prevent web spammers from exploiting search engines.
Wang said free web-hosting sites such as MySpace and Google BlogSpot can use Microsoft's methodology to identify spammers that might be using their sites as doorway domains. He said he hopes that search-engine companies will use the specifications for the tool described in the report to optimise their search engines to ferret out spam URLs.
Additionally, users who have blogs or forums on web-hosting sites can help alleviate the problem of web spamming by shutting down sites that are still active online but that they no longer visit or use, Wang said.