Feeds

Google bug blocks thousands of sites

Choking on spam, noise - 'Bots are 'ready to give up'

  • alert
  • submit to reddit

3 Big data security analytics techniques

Google, like the rest of us, seems to be fighting a losing battle to make sense of a rising tide of Internet garbage. But a programming error by the search engine has compounded the problem: by inadvertently blocking thousands of sites from Google users.

It's been called a "Google-NACK": you enter a particular search term and Google tells you that there are thousands of matching results, but fails to return many, or any results.

For example, a search for keyboard bracelet returns just five sites out of "about 49,900". (Your mileage may vary, as Google results differ depending on where you are, and which way the Segway scooters are pointing - but it's a fairly typical figure.)

What's happening? Award-winning researcher Seth Finkelstein has a theory why. Google's own spam filters, designed to weed out link farms created by pornographers and spammers and Scientologists, are crude, and are blocking many innocent sites.

"Technical solutions may have unintended consequences," he says.

"When Google searches for combinations of terms, pages with the terms close to each other are ranked highly. Such pages are also unfortunately often search spam pages, using a mismash of keywords. Thus, an unusual combination of words (and a dedicated spammer) will bring spam pages near the top of the results for certain keyword searches."

Perfect storm

One such example is Elwyn Jenkins, a spammer and former e-currency evangelist now based in Australia, who touted a pamphlet called "Make Money Online" - which boasted that "Dr. Jenkins has pioneered a unique approach to using Google and blogs to build traffic." Jenkins used a link farm using the domains www.microdoc-news.info, www.microdocs-news.info, smoogle.info, googlevillage.info, blogging-news.info, googlology.info, microdoc.bloki.com, www.question-factory.com, meeting-mentor.blogspot.com, radio.weblogs.com/0111745, verityintellectualproperties.com, textchunk.info, personalbrain.info, technacy.info, verity-ip.com, bloggers-news.info and ...

well, you get the picture. His Googlephilia was returned in kind by bloggers, who pumped up his PageRank™ (PageRank™s fatal flaw was incestuous linking) by linking to him approvingly. So creating a perfect storm - and an almighty headache - for Google's algorithm overlords.

The term GoogleNACK ('Negative ACKnowledgement') was coined by Gary Stock, CTO of Nexcerpt, a web clipping service that monitors thousands of news sources. Stock coined the phrase Googlewhack, sharing his research with Google.

In an effort to weed out the noise, Google constantly refines its weighting algorithm, which it says is a combination of a hundred different factors. In an attempt to thwart deliberate gaming by link farms and blog noise (exacerbated by lossy software gimmicks such as 'Trackbacks', which generate reams of content-free pages for Google's crawlers), Google has stepped back from its trademarked PageRank™ method and instead, emphasized more traditional factors such as anchor text.

"I'd say the people to *whack* here are those search-spammers
who are causing the problem and requiring Google's defense," says Finkelstein.

But all factors, once known, are susceptible to gaming, and perhaps no one search engine can ever hope to win an arms race against unscrupulous and determined spammers. Although calls have increased for Google to be regulated, perhaps the best defense is simply common sense: other search engines deliver surprising results that Google can't, and a wise browser will use a combination of tools. It certainly helps to shop around.

Or ask a librarian. ®

High performance access to file storage

More from The Register

next story
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
Audio fans, prepare yourself for the Second Coming ... of Blu-ray
High Fidelity Pure Audio – is this what your ears have been waiting for?
Did a date calculation bug just cost hard-up Co-op Bank £110m?
And just when Brit banking org needs £400m to stay afloat
MtGox chief Karpelès refuses to come to US for g-men's grilling
Bitcoin baron says he needs another lawyer for FinCEN chat
Zucker punched: Google gobbles Facebook-wooed Titan Aerospace
Up, up and away in my beautiful balloon flying broadband-bot
Apple DOMINATES the Valley, rakes in more profit than Google, HP, Intel, Cisco COMBINED
Cook & Co. also pay more taxes than those four worthies PLUS eBay and Oracle
It may be ILLEGAL to run Heartbleed health checks – IT lawyer
Do the right thing, earn up to 10 years in clink
prev story

Whitepapers

Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.