Feeds

Google Percolator – global search jolt sans MapReduce comedown

The machine that brews the Caffeine

Securing Web Applications Made Simple and Scalable

Google Caffeine — the revamped search infrastructure recently rolled out across Google's worldwide network of data centers — is based on a distributed data-processing system known as Percolator. Designed by Google and, until now, jealously guarded by Google, Percolator is a platform for "incremental processing" — a means of continually updating the company's epic search index without reprocessing the entire thing from scratch.

As Google senior director of engineering Eisar Lipkovitz told The Register earlier this month, the new platform is a speedier alternative to MapReduce, the distributed number-crunching platform that underpinned the company's previous indexing system. Two New York-based Google engineers — Daniel Peng and Frank Dabek — discuss the platform at length in a paper they are scheduled to present at the annual USENIX Symposium on Operating Systems Design and Implementation (OSDI) next month in Vancouver.

"MapReduce and other batch-processing systems cannot process small updates individually as they rely on creating large batches for efficiency," the paper reads. "We have built Percolator, a system for incrementally processing updates to a large data set, and deployed it to create the Google web search index. By replacing a batch-based indexing system with an indexing system based on incremental processing using Percolator, we process the same number of documents per day, while reducing the average age of documents in Google search results by 50%."

Speaking with The Register, Lipkovitz compared the system to classic database programming and the use of "database triggers." Because the index can be updated incrementally, the median document moves through Caffeine over 100 times faster than it moved through the company's old MapReduce setup. "The Percolator-based indexing system (known as Caffeine), crawls the same number of documents, but we feed each document through Percolator as it is crawled. The immediate advantage, and main design goal, of Caffeine is a reduction in latency."

“By replacing a batch-based indexing system with an indexing system based on incremental processing using Percolator, we process the same number of documents per day, while reducing the average age of documents in Google search results by 50%.”

In the past, Google's search index — an index of the entire web — was built with a series of batch operations. The MapReduce platform "maps" tasks across a vast collection of distributed machines, splitting them into tiny sub-tasks, before "reducing" the results into one master calculation. Google's webcrawlers would supply the raw data — the webpages and weblinks — and MapReduce would process this data, determining, among other things, each site's PageRank, that famous measure of how many other sites it links to.

The Essential Guide to IT Transformation

Next page: MapReduce reduced

More from The Register

next story
EU's top data cops to meet Google, Microsoft et al over 'right to be forgotten'
Plan to hammer out 'coherent' guidelines. Good luck chaps!
Manic malware Mayhem spreads through Linux, FreeBSD web servers
And how Google could cripple infection rate in a second
FLAPE – the next BIG THING in storage
Find cold data with flash, transmit it from tape
Seagate chances ARM with NAS boxes for the SOHO crowd
There's an Atom-powered offering, too
Intel teaches Oracle how to become the latest and greatest Xeon Whisperer
E7-8895 v2 chips are best of the bunch, and with firmware-unlocked speed control
Gartner: To the right, to the right – biz sync firms who've won in a box to the right...
Magic quadrant: Top marks for, er, completeness of vision, EMC
prev story

Whitepapers

Top three mobile application threats
Prevent sensitive data leakage over insecure channels or stolen mobile devices.
The Essential Guide to IT Transformation
ServiceNow discusses three IT transformations that can help CIO's automate IT services to transform IT and the enterprise.
Mobile application security vulnerability report
The alarming realities regarding the sheer number of applications vulnerable to attack, and the most common and easily addressable vulnerability errors.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Consolidation: the foundation for IT and business transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.