Welcome to P2PNET.net - The original daily p2p and digital news site. Always First!
Register | Login
RIAA News
Cool Stuff
MPAA News
Games / Consoles
News
Music
Movies
TV
Open Source
Mobiles
Advertising
Product News
P2P
Off Topic
Freedom
Politics
Interviews
Security
DRM
Links
Kids and Kartels
Search: 
Search
 
Web P2PNET   
Search: 
Search
Torrent Site Tracker
TekSavvy
 
Add real-time p2pnet headlines to YOUR site ! Click here to download our newsfeed code

Microsoft spam defeater

p2p news / p2pnet: Will Lord of the Rings copyright holders soon be suing Microsoft for infringing the Strider LotR name?

Strider, you’ll recall, is the ranger character who keeps Frodo from harm but for Microsoft, it’s short for Strider Search Defender: Automatic and Systematic Discovery of Search Spammers through Non-Content Analysis.

“To make their URLs look more legitimate so that search users are more likely to click the links, many spammers create doorway pages on reputable domains and use their URLs in comment spamming,” says the Microsoft team. “When a user clicks on a doorway-page link in search listings, her browser is instructed to either redirect to or fetch ads listing from the actual target page, potentially operated by the spammer.”

Doorway pages are web pages created for spamdexing, or spamming the index of a search engine by inserting results for particular phrases with the purpose of sending you to a different page, explains Wikipedia. They’re also known as landing pages, bridge pages, portal pages, zebra pages (a humorous arbitrary coinage by Jill Whalen of High Rankings Advisor), jump pages, gateway pages, entry pages and by other names.

They redirect visitors without their knowledge use some form of cloaking, it says.

Search Defender consists of two steps, say its creators:

1. Starting with a seed list of confirmed spam URLs, the Spam Hunter supplies them as search terms (or “link:” query terms) to search engines to locate the forums and guest books at which they were spammed, gathers additional URLs from each of these pages to grow the list, and does this iteratively until the list “converges”, i.e., the list no longer grows significantly after a query iteration.

The list automatically generated from the above step is only a list of “potential” spam URLs because there can be false positives. For example, some spammed forum pages may contain earlier comments from actual users that include non-spam URLs; spammers may intentionally intersperse non-spam URLs with spam ones.

2. To filter out false positives, we feed the list of potential spam URLs to the Strider URL Tracer (which we have previously released to help trademark owners find typo-squatting domains of their websites). The tracer provides a key functionality called the Top Domain view: given a list of (primary) URLs, the tracer launches an actual browser to visit each URL and records all secondary URLs visited as a result. At the end of the batched scan, the Top Domain view provides the list of third-party domains that received secondary-URL traffic and rank them by the number of primary URLs that generated traffic to them. If the input is a list of potential spam URLs, the Top Domain view essentially highlights those target-page domains that are associated with a large number of doorway-page URLs. To further reduce false positives, we use the whitelist of legitimate ads syndicators and web-analytics servers that were heavy redirection-traffic receivers in our Strider HoneyMonkey scan of the top one million click-through URLs. The ranked Top Domain list is then used to prioritize manual investigation. Once a third-party domain is determined to be a spammer’s domain, all doorway-page URLs associated with that domain are labeled as high-potential spam URLs.

Our Search Defender approach has two desirable properties that naturally turn the spammers’ spamming activities against themselves:

1. The more widely spammed a URL is, the easier it is for the spam hunter to find it. Once a spammed forum is identified, it becomes a “HoneyForum” that can be used to capture new spam URLs in new comment postings. Ideally, since there is a delay between spamming and its effect on search engine results, our spam hunter should be able to identify new spam URLs and notify the search engine before the URLs enter top search results.

2. The more doorway pages a spammer creates, the higher priority its target-page domain is placed on the Top Domain list for investigation.

The team says it’s released the preliminary study to raise awareness by providing a systematic analysis and proposing a solution so the web community can start working together to combat this problem. It states:

We urge owners of blog sites and free hosting sites to actively monitor their websites to detect abuse," it says, adding:

Similarly, advertisement syndicators can detect potential spammers by monitoring those customers who serve ads on a huge number of different URLs through a single account because it is highly unlikely that anyone can generate quality content at that scale. Second, although the content on some spam pages may actually have decent relevance, we urge search engines to consider removing such pages so as not to encourage web spamming. Third, we urge owners of publicly accessible forums (and guest books, etc.) to do a local search of “blogspot.com” and other spam-related domain names reported on this page to see if their forums have been abused and should be protected.

For example, searching for “blogspot.com” at http://www.stat.ucla.edu/forums/search.php?f=325, or searching for “funpic.org”, or “yoll.net”, or “freett.com”, or “fc2.com” at http://coolplayer.sourceforge.net/phorum/search.php?f=2 would generate a large number of hits.

Finally, in some cases, the owners of the target-page domains may not be directly involved in the spamming activities of the doorway pages that redirect to them; their “affiliates” may be the ones who are actually performing the spamming. We urge the owners of such target-page domains to have a stronger rule that prohibits their affiliates from using spamming techniques to draw traffic."

Meanwhile, "But what’s Google doing?" – wonders Steve Bryant on eWeek, going on, "CEO Eric Schmidt doesn’t seem to be interested in fighting spam. He seems to believe that click fraud – much of which happens on spam made-for-AdSense sites – should just happen.

"Google’s best suggestion: incoporate the ‘nofollow’ attribute for hyperlinks in comments left by users, so that comments don’t get any credit when Google ranks Web sites in search results. Google also recently made changes to the way that its algorithms judge the validity of advertisers’ landing pages. While welcomed by many, the change also has the effect of enforcing Google’s design standards on private sites.

"The merits of Google’s decision can be debated back and forth. But the differing approaches the two companies are taking vis-a-vis spam are enlightening:

"Microsoft wants to remove the inconvenience altogether, whereas Google seems to want to push it onto consumers."

Digg this.

Also See:
MicrosoftTechnical Report: MSR-TR-2006-97, July 12, 2006
eWeekDear Google: When It Comes to Spam, Is Microsoft the Good Guy?, July 13, 2006


p2pnet newsfeeds for your site.
rss feed: http://p2pnet.net/p2p.rss
Mobile – http://p2pnet.net/index-wml.php

HOME

2 Responses to “Microsoft spam defeater”

  1. Fioreidcet Says:

    . Its all.

  2. movie Says:

    ~

Leave a Reply

Please no Spam, flaming (attacking others), trolling, and posting off-topic. Thanks.

    Advertisements
MP3Rocket


Remove Spyware with AntiSpyware for Windows®