Feeds

Search engines stink, and they're getting worse

An academic report finds them all sadly wanting

  • alert
  • submit to reddit

Bridging the IT gap between rising business demands and ageing tools

The proportion of information on the Internet that is indexed by search engines is declining, according to a recent study by Steve Lawrence and Lee Giles of the NEC Research Institute at Princeton, and reported in Nature. The engines do not index sites equally, new pages may remain unindexed for months, but worst of all, even the best engine only reaches 16 per cent of the Web. The survey was carried out in February. Furthermore, the situation is getting worse, since in December 1997 around 34 per cent of information was indexed. The problem is multifaceted. The Web has around 15 terabytes of data in some 800 million pages, plus 180 million images and is growing faster than the ability of the engines to search. The growth rate is about 3 million pages/day. There is apparently no coordination between search engine operators it seems, so that the cognoscenti could turn to a particular engine for a particular subject area. The dark side is of course that many pages that have made it to a search engine disappear without trace. So how well are the best-known engines doing? Dismally, is the answer. The best, according to Lawrence & Giles, is Northern Light, which covers a mere 16 per cent of the Web, just pipping Alta Vista's 15.5 percent (although that 0.5 per cent difference adds up to around 4 million unindexed pages). Microsoft can only manage 8.5 per cent, Yahoo 7.4 per cent, Excite 5.6 per cent, and Lycos is the dunce at 2.5 per cent. This should make people think about defaulting to using engines on portals. Of course enlightened searchers use meta-engines that use several engines and combine the results, but each has its inconveniences and idiosyncrasies we have found. The researchers found that 83 per cent of web sites have commercial content, with only a vociferous 1.5 per cent of sites being pornographic: they, at least, have found the secrets of tweaking their sites to get them indexed, it would appear. It looks as though archivists will not be out of a job for a long time, in view of this failure of the Web as a reliable and comprehensive online library. It reminds us of the persistent story that the French Bibliotheque Nationale used to store its books by size and colour in its old building. That's effectively what's happening on the Web: we don't know how much is unindexed, because it is hard to study the overlap between engines, but the odds are that half the information on the Web cannot be found with search engines at all. ®

The Power of One Brief: Top reasons to choose HP BladeSystem

More from The Register

next story
BBC goes offline in MASSIVE COCKUP: Stephen Fry partly muzzled
Auntie tight-lipped as major outage rolls on
iPad? More like iFAD: We reveal why Apple fell into IBM's arms
But never fear fanbois, you're still lapping up iPhones, Macs
Nadella: Apps must run on ALL WINDOWS – PCs, slabs and mobes
Phone egg, meet desktop chicken - your mother
HP, Microsoft prove it again: Big Business doesn't create jobs
SMEs get lip service - what they need is dinner at the Club
White? Male? You work in tech? Let us guess ... Twitter? We KNEW it!
Grim diversity numbers dumped alongside Facebook earnings
ITC: Seagate and LSI can infringe Realtek patents because Realtek isn't in the US
Land of the (get off scot) free, when it's a foreign owner
Dude, you're getting a Dell – with BITCOIN: IT giant slurps cryptocash
1. Buy PC with Bitcoin. 2. Mine more coins. 3. Goto step 1
There's NOTHING on TV in Europe – American video DOMINATES
Even France's mega subsidies don't stop US content onslaught
You! Pirate! Stop pirating, or we shall admonish you politely. Repeatedly, if necessary
And we shall go about telling people you smell. No, not really
prev story

Whitepapers

Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Reducing security risks from open source software
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Consolidation: the foundation for IT and business transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.