Feeds

Google's 'post holiday' Caffeine shot still brewing

Worldwide roll-out in 'coming months'

Top 5 reasons to deploy VMware with Tegile

The web is still waiting for the worldwide roll-out of Google's next-generation search infrastructure, the mysterious indexing system overhaul known as "Caffeine."

A recent Wired profile of Google's search team indicates that Caffeine has already been deployed. But it seems the technology is still limited to a single data center, and though Google had planned to roll it out to other facilities after the New Year, this has yet to happen.

According to Search Engine Land, a Google spokesperson says that Caffeine will roll out across the company's global network of data centers "over the coming months." Previously, über-Googler Matt Cutts had indicated that Caffeine would be rolled out to multiple data centers "after the holidays," meaning after first of the year. And we're now two months on from January 1.

In early November, after testing Caffeine in a public sandbox for several weeks, Cutts indicated the platform would soon be rolled out to a single data center for use on the company's live search engine and that the company would follow suit with other data centers in a matter of weeks.

"Caffeine will go live at one data center so that we can continue to collect data and improve the technology, but I don’t expect Caffeine to go live at additional data centers until after the holidays are over," Cutts wrote on November 10. "Most searchers wouldn’t immediately notice any changes with Caffeine, but going slowly not only gives us time to collect feedback and improve, but will also minimize the stress on webmasters during the holidays."

Google did not immediately respond to our requests for comment. But that Google spokesperson tells Search Engine Land that the company expects to "roll [Caffeine] out to all data centers over the coming months." The company operates roughly 36 custom-built data centers across the globe.

"We run lots of tests with this big a change [sic] to our infrastructure,” the spokesperson says. “We want the new system to meet or exceed the abilities of our current system, and it can take time to ensure that everything looks good.”

It should be noted that Cutts never gave an exact date for the roll-out. He merely said it would not happen until after the holidays and - subsequently - "until at least January."

Caffeine continues to run in that single data center. In late November, according to Search Engine Roundtable, Cutts said that the the Google IP address 209.85.225.103 was hitting that single Caffeinated data center 50 per cent of the time, and it appears Google search-engine IPs are still mapping to the same data center.

"The data center remains the same,” the Google spokesperson tells Search Engine Land, “but different IP addresses can map to different data centers at different times due to how Google manages its traffic. Because of how Google employs custom load-balancing, there is not a single IP address that will always reach the Caffeine data center.”

Cutts first unveiled Caffeine - at least partially - in August with a post to the official Google Webmaster Central blog, calling it a "secret project" to build the "next-generation architecture for Google's web search," before pointing users to a sandbox where they could test it. Speaking with The Reg days later, he called it "a fundamental re-architecting" of Google's search indexing system.

"It's larger than a revamp," he told us. "It's more along the lines of a rewrite. And it's really great. It gives us a lot more flexibility, a lot more power. The ability to index more documents. Indexing speeds - that is, how quickly you can put a document through our indexing system and make it searchable - is much, much better."

This is not a change to Google's search philosophy. It's not a change to its famous search algorithms. It's a change to the way the company builds its index of all known websites and the metadata needed to describe them - the index that the algorithms rely on. "The new infrastructure sits 'under the hood' of Google's search engine," read Cutts' original blog post, "which means that most users won't notice a difference in search results."

After interviews with Google's search team, Wired's Steve Levy described Caffeine as something that makes it even easier for engineers to add "signals" - i.e. "contextual clues that help the search engine rank the millions of possible results to any query, ensuring that the most useful ones float to the top."

Cutts confirmed with The Reg that as we had reported earlier, Caffeine includes an overhaul of the company's distributed Google File System, or GFS. A technology two years in the making, the so-called GFS2 is a significant departure from the original Google File System that debuted almost ten years ago and now drives services across the Google empire.

With GFS, a master node oversees data that's spread across a series of distributed chunkservers, - architecture that's not exactly suited to apps that require low latency, such as YouTube and Gmail. That lone master is a single point of failure. To solve this problem, GFS2 uses not only distributed slaves, but distributed masters as well.

Cutts also said that Caffeine uses other back-end technologies recently developed by the company, but he declined to name them. He indicated that these did not include updates to MapReduce, Google's distributed number crunching platform, or BigTable, its distributed database.

Whatever new infrastructure technologies underpin Caffeine, they have not been deployed across other Google services. But Cutts indicated that Google hopes to do so with at least some of them. Google's distributed global infrastructure is designed to operate a like a single machine, running all its online services. Certainly, GFS2 will be deployed across the Googlenet. ®

Choosing a cloud hosting partner with confidence

More from The Register

next story
Azure TITSUP caused by INFINITE LOOP
Fat fingered geo-block kept Aussies in the dark
You think the CLOUD's insecure? It's BETTER than UK.GOV's DATA CENTRES
We don't even know where some of them ARE – Maude
Want to STUFF Facebook with blatant ADVERTISING? Fine! But you must PAY
Pony up or push off, Zuck tells social marketeers
Oi, Europe! Tell US feds to GTFO of our servers, say Microsoft and pals
By writing a really angry letter about how it's harming our cloud business, ta
SAVE ME, NASA system builder, from my DEAD WORKSTATION
Anal-retentive hardware nerd in paws-on workstation crisis
Astro-boffins start opening universe simulation data
Got a supercomputer? Want to simulate a universe? Here you go
prev story

Whitepapers

Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Reducing the cost and complexity of web vulnerability management
How using vulnerability assessments to identify exploitable weaknesses and take corrective action can reduce the risk of hackers finding your site and attacking it.
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.