Feeds

Google Percolator – global search jolt sans MapReduce comedown

The machine that brews the Caffeine

3 Big data security analytics techniques

Google Caffeine — the revamped search infrastructure recently rolled out across Google's worldwide network of data centers — is based on a distributed data-processing system known as Percolator. Designed by Google and, until now, jealously guarded by Google, Percolator is a platform for "incremental processing" — a means of continually updating the company's epic search index without reprocessing the entire thing from scratch.

As Google senior director of engineering Eisar Lipkovitz told The Register earlier this month, the new platform is a speedier alternative to MapReduce, the distributed number-crunching platform that underpinned the company's previous indexing system. Two New York-based Google engineers — Daniel Peng and Frank Dabek — discuss the platform at length in a paper they are scheduled to present at the annual USENIX Symposium on Operating Systems Design and Implementation (OSDI) next month in Vancouver.

"MapReduce and other batch-processing systems cannot process small updates individually as they rely on creating large batches for efficiency," the paper reads. "We have built Percolator, a system for incrementally processing updates to a large data set, and deployed it to create the Google web search index. By replacing a batch-based indexing system with an indexing system based on incremental processing using Percolator, we process the same number of documents per day, while reducing the average age of documents in Google search results by 50%."

Speaking with The Register, Lipkovitz compared the system to classic database programming and the use of "database triggers." Because the index can be updated incrementally, the median document moves through Caffeine over 100 times faster than it moved through the company's old MapReduce setup. "The Percolator-based indexing system (known as Caffeine), crawls the same number of documents, but we feed each document through Percolator as it is crawled. The immediate advantage, and main design goal, of Caffeine is a reduction in latency."

“By replacing a batch-based indexing system with an indexing system based on incremental processing using Percolator, we process the same number of documents per day, while reducing the average age of documents in Google search results by 50%.”

In the past, Google's search index — an index of the entire web — was built with a series of batch operations. The MapReduce platform "maps" tasks across a vast collection of distributed machines, splitting them into tiny sub-tasks, before "reducing" the results into one master calculation. Google's webcrawlers would supply the raw data — the webpages and weblinks — and MapReduce would process this data, determining, among other things, each site's PageRank, that famous measure of how many other sites it links to.

SANS - Survey on application security programs

Next page: MapReduce reduced

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Kingston DataTraveler MicroDuo: Turn your phone into a 72GB beast
USB-usiness in the front, micro-USB party in the back
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
BOFH: Oh DO tell us what you think. *CLICK*
$%%&amp Oh dear, we've been cut *CLICK* Well hello *CLICK* You're breaking up...
Bored with trading oil and gold? Why not flog some CLOUD servers?
Chicago Mercantile Exchange plans cloud spot exchange
Just what could be inside Dropbox's new 'Home For Life'?
Biz apps, messaging, photos, email, more storage – sorry, did you think there would be cake?
IT bods: How long does it take YOU to train up on new tech?
I'll leave my arrays to do the hard work, if you don't mind
prev story

Whitepapers

Designing a defence for mobile apps
In this whitepaper learn the various considerations for defending mobile applications; from the mobile application architecture itself to the myriad testing technologies needed to properly assess mobile applications risk.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.