Feeds

Google bug blocks thousands of sites

Choking on spam, noise - 'Bots are 'ready to give up'

  • alert
  • submit to reddit

New hybrid storage solutions

Google, like the rest of us, seems to be fighting a losing battle to make sense of a rising tide of Internet garbage. But a programming error by the search engine has compounded the problem: by inadvertently blocking thousands of sites from Google users.

It's been called a "Google-NACK": you enter a particular search term and Google tells you that there are thousands of matching results, but fails to return many, or any results.

For example, a search for keyboard bracelet returns just five sites out of "about 49,900". (Your mileage may vary, as Google results differ depending on where you are, and which way the Segway scooters are pointing - but it's a fairly typical figure.)

What's happening? Award-winning researcher Seth Finkelstein has a theory why. Google's own spam filters, designed to weed out link farms created by pornographers and spammers and Scientologists, are crude, and are blocking many innocent sites.

"Technical solutions may have unintended consequences," he says.

"When Google searches for combinations of terms, pages with the terms close to each other are ranked highly. Such pages are also unfortunately often search spam pages, using a mismash of keywords. Thus, an unusual combination of words (and a dedicated spammer) will bring spam pages near the top of the results for certain keyword searches."

Perfect storm

One such example is Elwyn Jenkins, a spammer and former e-currency evangelist now based in Australia, who touted a pamphlet called "Make Money Online" - which boasted that "Dr. Jenkins has pioneered a unique approach to using Google and blogs to build traffic." Jenkins used a link farm using the domains www.microdoc-news.info, www.microdocs-news.info, smoogle.info, googlevillage.info, blogging-news.info, googlology.info, microdoc.bloki.com, www.question-factory.com, meeting-mentor.blogspot.com, radio.weblogs.com/0111745, verityintellectualproperties.com, textchunk.info, personalbrain.info, technacy.info, verity-ip.com, bloggers-news.info and ...

well, you get the picture. His Googlephilia was returned in kind by bloggers, who pumped up his PageRank™ (PageRank™s fatal flaw was incestuous linking) by linking to him approvingly. So creating a perfect storm - and an almighty headache - for Google's algorithm overlords.

The term GoogleNACK ('Negative ACKnowledgement') was coined by Gary Stock, CTO of Nexcerpt, a web clipping service that monitors thousands of news sources. Stock coined the phrase Googlewhack, sharing his research with Google.

In an effort to weed out the noise, Google constantly refines its weighting algorithm, which it says is a combination of a hundred different factors. In an attempt to thwart deliberate gaming by link farms and blog noise (exacerbated by lossy software gimmicks such as 'Trackbacks', which generate reams of content-free pages for Google's crawlers), Google has stepped back from its trademarked PageRank™ method and instead, emphasized more traditional factors such as anchor text.

"I'd say the people to *whack* here are those search-spammers
who are causing the problem and requiring Google's defense," says Finkelstein.

But all factors, once known, are susceptible to gaming, and perhaps no one search engine can ever hope to win an arms race against unscrupulous and determined spammers. Although calls have increased for Google to be regulated, perhaps the best defense is simply common sense: other search engines deliver surprising results that Google can't, and a wise browser will use a combination of tools. It certainly helps to shop around.

Or ask a librarian. ®

Security for virtualized datacentres

More from The Register

next story
Phones 4u slips into administration after EE cuts ties with Brit mobe retailer
More than 5,500 jobs could be axed if rescue mission fails
JINGS! Microsoft Bing called Scots indyref RIGHT!
Redmond sporran metrics get one in the ten ring
Driving with an Apple Watch could land you with a £100 FINE
Bad news for tech-addicted fanbois behind the wheel
Murdoch to Europe: Inflict MORE PAIN on Google, please
'Platform for piracy' must be punished, or it'll kill us in FIVE YEARS
Phones 4u website DIES as wounded mobe retailer struggles to stay above water
Founder blames 'ruthless network partners' for implosion
Sony says year's losses will be FOUR TIMES DEEPER than thought
Losses of more than $2 BILLION loom over troubled Japanese corp
Radio hams can encrypt, in emergencies, says Ofcom
Consultation promises new spectrum and hints at relaxed licence conditions
Why Oracle CEO Larry Ellison had to go ... Except he hasn't
Silicon Valley's veteran seadog in piratical Putin impression
Big Content Australia just blew a big hole in its credibility
AHEDA's research on average content prices did not expose methodology, so appears less than rigourous
prev story

Whitepapers

Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
The next step in data security
With recent increased privacy concerns and computers becoming more powerful, the chance of hackers being able to crack smaller-sized RSA keys increases.