23 июн. 2006 г.

Google Gets Spam Punk'd

By: Jason Lee Miller:

When a blog called Monetize published "The Step-by-Step Guide to Getting Billions of Pages Indexed by Google" pointing to an SEOer's black hat trick of returning 5.5 billion spammy results in under three weeks, the SEM world did a giant collective spit take.

Google has since deleted all of the pages cluttering up the Big Daddy database, and an insider tells John Battelle that Google exaggerates the number of results returned, by as much as 263 times, if count estimates are in the billions (and if the commentator's math is correct).

Monetize's step-by-step guide instructs readers to:
  1. Register a meaningless domain consisting of numbers, letters, and secret symbols.
  2. Buy as many article databases as you can.
  3. Create or buy a common scraper script.
  4. Launch your blog comment spam attack.
Ana Aman gives another how-to here.

MSN and Yahoo! didn't return near the level of pages Google returned for the spamgantic domain, but Google apologists were quick to point out the speed and expansiveness of Google's crawlers. According to Email Battles, MSN had indexed just 62 of the bogus sites in that time.

That so many pages were outranking legitimate webpages and increasing the cost of PPC ads at the same time raised serious questions among search engine marketers about the possibility of click fraud and the extent of rank drop among websites.

Google engineer Adam Lasnik chimed in on John Battelle's blog to shine some light on the situation:

we noticed that lots of subdomains got indexed last week -- and sometimes listed in search results -- that shouldn't have been. Compounding the issue, our result count estimates in these contexts was MANY orders of magnitude off. For example, the one site that supposedly had 5.5 billion pages in the index actually had under 1/100,000th of that.
So how did this happen? We pushed some corrupted data with our index. Once we diagnosed the problem, we started rolling the data back and pushed something better... and we've been putting in place checks so that this kind of thing doesn't happen again.

Some quick math by a subsequent commentator dropped the 5.5 billion subdomains to just under 21 million. The commentator, going by "JG" asks, without receiving an answer, what that means for the actual number of search results returned from the rest of Google's index.