Feeds

Google bug blocks thousands of sites

Choking on spam, noise - 'Bots are 'ready to give up'

  • alert
  • submit to reddit

Internet Security Threat Report 2014

Google, like the rest of us, seems to be fighting a losing battle to make sense of a rising tide of Internet garbage. But a programming error by the search engine has compounded the problem: by inadvertently blocking thousands of sites from Google users.

It's been called a "Google-NACK": you enter a particular search term and Google tells you that there are thousands of matching results, but fails to return many, or any results.

For example, a search for keyboard bracelet returns just five sites out of "about 49,900". (Your mileage may vary, as Google results differ depending on where you are, and which way the Segway scooters are pointing - but it's a fairly typical figure.)

What's happening? Award-winning researcher Seth Finkelstein has a theory why. Google's own spam filters, designed to weed out link farms created by pornographers and spammers and Scientologists, are crude, and are blocking many innocent sites.

"Technical solutions may have unintended consequences," he says.

"When Google searches for combinations of terms, pages with the terms close to each other are ranked highly. Such pages are also unfortunately often search spam pages, using a mismash of keywords. Thus, an unusual combination of words (and a dedicated spammer) will bring spam pages near the top of the results for certain keyword searches."

Perfect storm

One such example is Elwyn Jenkins, a spammer and former e-currency evangelist now based in Australia, who touted a pamphlet called "Make Money Online" - which boasted that "Dr. Jenkins has pioneered a unique approach to using Google and blogs to build traffic." Jenkins used a link farm using the domains www.microdoc-news.info, www.microdocs-news.info, smoogle.info, googlevillage.info, blogging-news.info, googlology.info, microdoc.bloki.com, www.question-factory.com, meeting-mentor.blogspot.com, radio.weblogs.com/0111745, verityintellectualproperties.com, textchunk.info, personalbrain.info, technacy.info, verity-ip.com, bloggers-news.info and ...

well, you get the picture. His Googlephilia was returned in kind by bloggers, who pumped up his PageRank™ (PageRank™s fatal flaw was incestuous linking) by linking to him approvingly. So creating a perfect storm - and an almighty headache - for Google's algorithm overlords.

The term GoogleNACK ('Negative ACKnowledgement') was coined by Gary Stock, CTO of Nexcerpt, a web clipping service that monitors thousands of news sources. Stock coined the phrase Googlewhack, sharing his research with Google.

In an effort to weed out the noise, Google constantly refines its weighting algorithm, which it says is a combination of a hundred different factors. In an attempt to thwart deliberate gaming by link farms and blog noise (exacerbated by lossy software gimmicks such as 'Trackbacks', which generate reams of content-free pages for Google's crawlers), Google has stepped back from its trademarked PageRank™ method and instead, emphasized more traditional factors such as anchor text.

"I'd say the people to *whack* here are those search-spammers
who are causing the problem and requiring Google's defense," says Finkelstein.

But all factors, once known, are susceptible to gaming, and perhaps no one search engine can ever hope to win an arms race against unscrupulous and determined spammers. Although calls have increased for Google to be regulated, perhaps the best defense is simply common sense: other search engines deliver surprising results that Google can't, and a wise browser will use a combination of tools. It certainly helps to shop around.

Or ask a librarian. ®

Providing a secure and efficient Helpdesk

More from The Register

next story
Doctor Who's Flatline: Cool monsters, yes, but utterly limp subplots
We know what the Doctor does, stop going on about it already
Facebook, Apple: LADIES! Why not FREEZE your EGGS? It's on the company!
No biological clockwatching when you work in Silicon Valley
'Cowardly, venomous trolls' threatened with TWO-YEAR sentences for menacing posts
UK government: 'Taking a stand against a baying cyber-mob'
Happiness economics is bollocks. Oh, UK.gov just adopted it? Er ...
Opportunity doesn't knock; it costs us instead
Arab States make play for greater government control of the internet
Nerds told to get lost in last-minute power grab bid at UN meeting
Zippy one-liners, broken promises: Doctor Who on the Orient Express
Series finally hits stride, but Clara's U-turn is baffling
Don't bother telling people if you lose their data, say Euro bods
You read that right – with the proviso that it's encrypted
Apple SILENCES Bose, YANKS headphones from stores
The, er, Beats go on after noise-cancelling spat
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
Win a year’s supply of chocolate
There is no techie angle to this competition so we're not going to pretend there is, but everyone loves chocolate so who cares.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.