Feeds

Google bug blocks thousands of sites

Choking on spam, noise - 'Bots are 'ready to give up'

  • alert
  • submit to reddit

Security for virtualized datacentres

Google, like the rest of us, seems to be fighting a losing battle to make sense of a rising tide of Internet garbage. But a programming error by the search engine has compounded the problem: by inadvertently blocking thousands of sites from Google users.

It's been called a "Google-NACK": you enter a particular search term and Google tells you that there are thousands of matching results, but fails to return many, or any results.

For example, a search for keyboard bracelet returns just five sites out of "about 49,900". (Your mileage may vary, as Google results differ depending on where you are, and which way the Segway scooters are pointing - but it's a fairly typical figure.)

What's happening? Award-winning researcher Seth Finkelstein has a theory why. Google's own spam filters, designed to weed out link farms created by pornographers and spammers and Scientologists, are crude, and are blocking many innocent sites.

"Technical solutions may have unintended consequences," he says.

"When Google searches for combinations of terms, pages with the terms close to each other are ranked highly. Such pages are also unfortunately often search spam pages, using a mismash of keywords. Thus, an unusual combination of words (and a dedicated spammer) will bring spam pages near the top of the results for certain keyword searches."

Perfect storm

One such example is Elwyn Jenkins, a spammer and former e-currency evangelist now based in Australia, who touted a pamphlet called "Make Money Online" - which boasted that "Dr. Jenkins has pioneered a unique approach to using Google and blogs to build traffic." Jenkins used a link farm using the domains www.microdoc-news.info, www.microdocs-news.info, smoogle.info, googlevillage.info, blogging-news.info, googlology.info, microdoc.bloki.com, www.question-factory.com, meeting-mentor.blogspot.com, radio.weblogs.com/0111745, verityintellectualproperties.com, textchunk.info, personalbrain.info, technacy.info, verity-ip.com, bloggers-news.info and ...

well, you get the picture. His Googlephilia was returned in kind by bloggers, who pumped up his PageRank™ (PageRank™s fatal flaw was incestuous linking) by linking to him approvingly. So creating a perfect storm - and an almighty headache - for Google's algorithm overlords.

The term GoogleNACK ('Negative ACKnowledgement') was coined by Gary Stock, CTO of Nexcerpt, a web clipping service that monitors thousands of news sources. Stock coined the phrase Googlewhack, sharing his research with Google.

In an effort to weed out the noise, Google constantly refines its weighting algorithm, which it says is a combination of a hundred different factors. In an attempt to thwart deliberate gaming by link farms and blog noise (exacerbated by lossy software gimmicks such as 'Trackbacks', which generate reams of content-free pages for Google's crawlers), Google has stepped back from its trademarked PageRank™ method and instead, emphasized more traditional factors such as anchor text.

"I'd say the people to *whack* here are those search-spammers
who are causing the problem and requiring Google's defense," says Finkelstein.

But all factors, once known, are susceptible to gaming, and perhaps no one search engine can ever hope to win an arms race against unscrupulous and determined spammers. Although calls have increased for Google to be regulated, perhaps the best defense is simply common sense: other search engines deliver surprising results that Google can't, and a wise browser will use a combination of tools. It certainly helps to shop around.

Or ask a librarian. ®

Top 5 reasons to deploy VMware with Tegile

More from The Register

next story
Facebook pays INFINITELY MORE UK corp tax than in 2012
Thanks for the £3k, Zuck. Doh! you're IN CREDIT. Guess not
DOUBLE BONK: Testy fanbois catch Apple Pay picking pockets
Users wail as tapcash transactions are duplicated
Happiness economics is bollocks. Oh, UK.gov just adopted it? Er ...
Opportunity doesn't knock; it costs us instead
Google Glassholes are UNDATEABLE – HP exec
You need an emotional connection, says touchy-feely MD... We can do that
YARR! Pirates walk the plank: DMCA magnets sink in Google results
Spaffing copyrighted stuff over the web? No search ranking for you
In the next four weeks, 100 people will decide the future of the web
While America tucks into Thanksgiving turkey, the world will be taking over the net
prev story

Whitepapers

Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
New hybrid storage solutions
Tackling data challenges through emerging hybrid storage solutions that enable optimum database performance whilst managing costs and increasingly large data stores.