Feeds

Search pioneers join Yahoo! - but is the web beyond search?

Clever move - too late?

  • alert
  • submit to reddit

Secure remote control for conventional and virtual desktops

Few visitors to IBM's Almaden research lab in 1999 and 2000 can fail to have been impressed by its lead in web search. IBM's Clever project both predated and informed what became Google: Brin and Page cited the Almaden work in their 1998 paper The Anatomy of a Large-Scale Hypertextual Web Search Engine [pdf, 124kb]. Google drew on the same concept, which they were to trademark and market as PageRank™ of using the link structure to infer quality and authority.

But the Clever team was already thinking way beyond PageRank™. Your reporter was one such visitor more than five years ago and was struck by the scope and depth of the work. For example, in 1998 the Clever team was publishing its research into hierarchical topic taxonomies, and inferring web communities. Today, such subjects are presented to conferences of former HTML coders (today's wiki-fiddlers) who appear to be hearing the topics for the first time, such is their wide-eyed wonderment.

Working within IBM also allowed the team to draw on its rich history of database research and linguistic analysis, and at IBM you try not to lose your customers' data.

Google's fate is well known. After last year's IPO it became one of the wealthiest technology companies on the planet, and its founders are billionaires.

And Clever?

Well, IBM appeared to have some inkling that the project was valuable to it. A spin-off was discussed, but never followed through, and IBM officially welcomed licensees at one stage. But Clever was never allowed the opportunity to compete directly with the commercial search rivals, so we never really saw its potential.

Clever's trajectory in some ways mirrors that of IBM's relational database work. With its System*R project, IBM had built the first implementation of the Relational Database in the early 70s, but bureaucratic infighting hampered the researchers' desire to turn it into real product for IBM's customers. Ingres was first to get an RDBMS out of the door and Oracle's single-minded marketing won it big inroads into the new market in the 1980s.

"We were convinced IBM would never ship" Jim Gray later recalled (in one of the best oral histories of a computer project on the net).

Now, however, Yahoo! has hired several of the Clever team and plans to recruit more.

Last week the New York Times reported that Prabhakar Raghavan, one-time project leader had been recruited from Verity, where he was chief scientist and CTO. Another staffer, Andrew Tomkins, is also on his way to Yahoo!, the Times reported.

These guys have their work cut out.

Web chaff beyond sorting?

"The World Wide Web of today is dramatically different from that of just five years ago," the team noted in 1999. "Predicting what it will be like in another five years (2004) seems futile. Will even the basic act of indexing the Web soon become infeasible?"

For a few years, it looked an improbably pessimistic question. But pessimists make the best engineers in the long run, and this now seems prescient.

Google's link-based algorithms were soon imitated by rivals, and as a consequence all today's search engines today must now mine a web stuffed with synthetic documents of little relevance to anyone, many of which are generated by machines on behalf of the customers of the more unscrupulous SEOs (Search Engine Optimizers)

It's an algorithm arms race, and the SEOs themselves know the scale of the problem they nurtured. Some estimate as much as a third of the web is fake, machine-generated pages and Google can't really tell which third it is. Meanwhile, neither Yahoo! Google nor MSN can still offer the most basic improvements on what AltaVista offered in 1996. queries sorted by date. Want a listing of Tony Blair's comments about Iraq published between June and August 2003? Forget it. AltaVista could do this then, and still can, but none of the big three can match this most basic of requests

Because rigging the search engines is so profitable, the junk web or "Web 2.0" as it's called, proliferates and mutates like a superbug. Each new solution to the problem is quickly co-opted by spammers and gamers. For example, last year's "tagging" craze is becoming this year's mortgage and Viagra scam.

Some maintain the web's problems can't be solved technically - but only politically or economically, for example by the application of compensation models which allow the really good data hoarded by database holders to be opened to the public at last. That may prove to be true: the are many flavors of private and public networks, we use a mixture every day, and that mixture will change over time.

The reassembled Clever team at Yahoo! may not even be offered a chance to answer the question.

The Times reports that the team itself is being directed to searching digital media, and hints that some areas of their earlier work remain IBM's intellectual property.

By some irony, we note that one of Sergey Brin's student projects was also searching digital media, only as a kind of RIAA enforcer. The system he developed was for the "automated detection of copyright violations", and was unfortunately called COPS (the COpyright Protection System). Fortunately, Sergey was more interested in developing a general purpose data mining application.

Would he make the same choice today?

Surely something must be done to renew the original raison d'etre behind both Google and Yahoo! - finding good stuff. The world in which an "I'm Feeling Lucky" button was even conceivable seems to belong to a distant past.

Google would rather sell you a shirt on Froogle, and Yahoo! would rather show you the way to the Coliseum, offering you a package tour that includes the ticket admission. And the former search leader's priorities seem to be elsewhere. In recent months Google has patented a widely used business method and beefed up its DC lobbying muscle, and last week's legal dispute over the hiring of a "search expert" by Google from Microsoft sounded thoroughly phoney and synthetic on both sides.

The Clever team that Yahoo! is reassembling are the genuine article. Perhaps if the management permits them, they'll be able to answer the question -

Whatever happened to search?®

Related stories

Google seeks RSS ad patent
Yahoo! buys! bloated! widgets!
Search Wars - the Empire strikes back
Are you trying to be funny? If so check [ ] this box
Lookout, France! Google hires neo-con headbanger
Strength through pessimism! Keeping your stuff safe

Beginner's guide to SSL certificates

More from The Register

next story
Google Glassholes are UNDATEABLE – HP exec
You need an emotional connection, says touchy-feely MD... We can do that
Just don't blame Bono! Apple iTunes music sales PLUMMET
Cupertino revenue hit by cheapo downloads, says report
US court SHUTS DOWN 'scammers posing as Microsoft, Facebook support staff'
Netizens allegedly duped into paying for bogus tech advice
Feds seek potential 'second Snowden' gov doc leaker – report
Hang on, Ed wasn't here when we compiled THIS document
Verizon bankrolls tech news site, bans tech's biggest stories
No agenda here. Just don't ever mention Net neutrality or spying, ok?
NATO declares WAR on Google Glass, mounts attack alongside MPAA
Yes, the National Association of Theater Owners is quite upset
Inside the EYE of the TORnado: From Navy spooks to Silk Road
It's hard enough to peel the onion, are you hard enough to eat the core?
prev story

Whitepapers

Cloud and hybrid-cloud data protection for VMware
Learn how quick and easy it is to configure backups and perform restores for VMware environments.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Top 5 reasons to deploy VMware with Tegile
Data demand and the rise of virtualization is challenging IT teams to deliver storage performance, scalability and capacity that can keep up, while maximizing efficiency.
Mitigating web security risk with SSL certificates
Web-based systems are essential tools for running business processes and delivering services to customers.