Feeds

Google's 'post holiday' Caffeine shot still brewing

Worldwide roll-out in 'coming months'

Maximizing your infrastructure through virtualization

The web is still waiting for the worldwide roll-out of Google's next-generation search infrastructure, the mysterious indexing system overhaul known as "Caffeine."

A recent Wired profile of Google's search team indicates that Caffeine has already been deployed. But it seems the technology is still limited to a single data center, and though Google had planned to roll it out to other facilities after the New Year, this has yet to happen.

According to Search Engine Land, a Google spokesperson says that Caffeine will roll out across the company's global network of data centers "over the coming months." Previously, über-Googler Matt Cutts had indicated that Caffeine would be rolled out to multiple data centers "after the holidays," meaning after first of the year. And we're now two months on from January 1.

In early November, after testing Caffeine in a public sandbox for several weeks, Cutts indicated the platform would soon be rolled out to a single data center for use on the company's live search engine and that the company would follow suit with other data centers in a matter of weeks.

"Caffeine will go live at one data center so that we can continue to collect data and improve the technology, but I don’t expect Caffeine to go live at additional data centers until after the holidays are over," Cutts wrote on November 10. "Most searchers wouldn’t immediately notice any changes with Caffeine, but going slowly not only gives us time to collect feedback and improve, but will also minimize the stress on webmasters during the holidays."

Google did not immediately respond to our requests for comment. But that Google spokesperson tells Search Engine Land that the company expects to "roll [Caffeine] out to all data centers over the coming months." The company operates roughly 36 custom-built data centers across the globe.

"We run lots of tests with this big a change [sic] to our infrastructure,” the spokesperson says. “We want the new system to meet or exceed the abilities of our current system, and it can take time to ensure that everything looks good.”

It should be noted that Cutts never gave an exact date for the roll-out. He merely said it would not happen until after the holidays and - subsequently - "until at least January."

Caffeine continues to run in that single data center. In late November, according to Search Engine Roundtable, Cutts said that the the Google IP address 209.85.225.103 was hitting that single Caffeinated data center 50 per cent of the time, and it appears Google search-engine IPs are still mapping to the same data center.

"The data center remains the same,” the Google spokesperson tells Search Engine Land, “but different IP addresses can map to different data centers at different times due to how Google manages its traffic. Because of how Google employs custom load-balancing, there is not a single IP address that will always reach the Caffeine data center.”

Cutts first unveiled Caffeine - at least partially - in August with a post to the official Google Webmaster Central blog, calling it a "secret project" to build the "next-generation architecture for Google's web search," before pointing users to a sandbox where they could test it. Speaking with The Reg days later, he called it "a fundamental re-architecting" of Google's search indexing system.

"It's larger than a revamp," he told us. "It's more along the lines of a rewrite. And it's really great. It gives us a lot more flexibility, a lot more power. The ability to index more documents. Indexing speeds - that is, how quickly you can put a document through our indexing system and make it searchable - is much, much better."

This is not a change to Google's search philosophy. It's not a change to its famous search algorithms. It's a change to the way the company builds its index of all known websites and the metadata needed to describe them - the index that the algorithms rely on. "The new infrastructure sits 'under the hood' of Google's search engine," read Cutts' original blog post, "which means that most users won't notice a difference in search results."

After interviews with Google's search team, Wired's Steve Levy described Caffeine as something that makes it even easier for engineers to add "signals" - i.e. "contextual clues that help the search engine rank the millions of possible results to any query, ensuring that the most useful ones float to the top."

Cutts confirmed with The Reg that as we had reported earlier, Caffeine includes an overhaul of the company's distributed Google File System, or GFS. A technology two years in the making, the so-called GFS2 is a significant departure from the original Google File System that debuted almost ten years ago and now drives services across the Google empire.

With GFS, a master node oversees data that's spread across a series of distributed chunkservers, - architecture that's not exactly suited to apps that require low latency, such as YouTube and Gmail. That lone master is a single point of failure. To solve this problem, GFS2 uses not only distributed slaves, but distributed masters as well.

Cutts also said that Caffeine uses other back-end technologies recently developed by the company, but he declined to name them. He indicated that these did not include updates to MapReduce, Google's distributed number crunching platform, or BigTable, its distributed database.

Whatever new infrastructure technologies underpin Caffeine, they have not been deployed across other Google services. But Cutts indicated that Google hopes to do so with at least some of them. Google's distributed global infrastructure is designed to operate a like a single machine, running all its online services. Certainly, GFS2 will be deployed across the Googlenet. ®

The Power of One eBook: Top reasons to choose HP BladeSystem

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
The triumph of VVOL: Everyone's jumping into bed with VMware
'Bandwagon'? Yes, we're on it and so what, say big dogs
Carbon tax repeal won't see data centre operators cut prices
Rackspace says electricity isn't a major cost, Equinix promises 'no levy'
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Application security programs and practises
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Securing Web Applications Made Simple and Scalable
Learn how automated security testing can provide a simple and scalable way to protect your web applications.