Feeds

Google's 'post holiday' Caffeine shot still brewing

Worldwide roll-out in 'coming months'

Security for virtualized datacentres

The web is still waiting for the worldwide roll-out of Google's next-generation search infrastructure, the mysterious indexing system overhaul known as "Caffeine."

A recent Wired profile of Google's search team indicates that Caffeine has already been deployed. But it seems the technology is still limited to a single data center, and though Google had planned to roll it out to other facilities after the New Year, this has yet to happen.

According to Search Engine Land, a Google spokesperson says that Caffeine will roll out across the company's global network of data centers "over the coming months." Previously, über-Googler Matt Cutts had indicated that Caffeine would be rolled out to multiple data centers "after the holidays," meaning after first of the year. And we're now two months on from January 1.

In early November, after testing Caffeine in a public sandbox for several weeks, Cutts indicated the platform would soon be rolled out to a single data center for use on the company's live search engine and that the company would follow suit with other data centers in a matter of weeks.

"Caffeine will go live at one data center so that we can continue to collect data and improve the technology, but I don’t expect Caffeine to go live at additional data centers until after the holidays are over," Cutts wrote on November 10. "Most searchers wouldn’t immediately notice any changes with Caffeine, but going slowly not only gives us time to collect feedback and improve, but will also minimize the stress on webmasters during the holidays."

Google did not immediately respond to our requests for comment. But that Google spokesperson tells Search Engine Land that the company expects to "roll [Caffeine] out to all data centers over the coming months." The company operates roughly 36 custom-built data centers across the globe.

"We run lots of tests with this big a change [sic] to our infrastructure,” the spokesperson says. “We want the new system to meet or exceed the abilities of our current system, and it can take time to ensure that everything looks good.”

It should be noted that Cutts never gave an exact date for the roll-out. He merely said it would not happen until after the holidays and - subsequently - "until at least January."

Caffeine continues to run in that single data center. In late November, according to Search Engine Roundtable, Cutts said that the the Google IP address 209.85.225.103 was hitting that single Caffeinated data center 50 per cent of the time, and it appears Google search-engine IPs are still mapping to the same data center.

"The data center remains the same,” the Google spokesperson tells Search Engine Land, “but different IP addresses can map to different data centers at different times due to how Google manages its traffic. Because of how Google employs custom load-balancing, there is not a single IP address that will always reach the Caffeine data center.”

Cutts first unveiled Caffeine - at least partially - in August with a post to the official Google Webmaster Central blog, calling it a "secret project" to build the "next-generation architecture for Google's web search," before pointing users to a sandbox where they could test it. Speaking with The Reg days later, he called it "a fundamental re-architecting" of Google's search indexing system.

"It's larger than a revamp," he told us. "It's more along the lines of a rewrite. And it's really great. It gives us a lot more flexibility, a lot more power. The ability to index more documents. Indexing speeds - that is, how quickly you can put a document through our indexing system and make it searchable - is much, much better."

This is not a change to Google's search philosophy. It's not a change to its famous search algorithms. It's a change to the way the company builds its index of all known websites and the metadata needed to describe them - the index that the algorithms rely on. "The new infrastructure sits 'under the hood' of Google's search engine," read Cutts' original blog post, "which means that most users won't notice a difference in search results."

After interviews with Google's search team, Wired's Steve Levy described Caffeine as something that makes it even easier for engineers to add "signals" - i.e. "contextual clues that help the search engine rank the millions of possible results to any query, ensuring that the most useful ones float to the top."

Cutts confirmed with The Reg that as we had reported earlier, Caffeine includes an overhaul of the company's distributed Google File System, or GFS. A technology two years in the making, the so-called GFS2 is a significant departure from the original Google File System that debuted almost ten years ago and now drives services across the Google empire.

With GFS, a master node oversees data that's spread across a series of distributed chunkservers, - architecture that's not exactly suited to apps that require low latency, such as YouTube and Gmail. That lone master is a single point of failure. To solve this problem, GFS2 uses not only distributed slaves, but distributed masters as well.

Cutts also said that Caffeine uses other back-end technologies recently developed by the company, but he declined to name them. He indicated that these did not include updates to MapReduce, Google's distributed number crunching platform, or BigTable, its distributed database.

Whatever new infrastructure technologies underpin Caffeine, they have not been deployed across other Google services. But Cutts indicated that Google hopes to do so with at least some of them. Google's distributed global infrastructure is designed to operate a like a single machine, running all its online services. Certainly, GFS2 will be deployed across the Googlenet. ®

Providing a secure and efficient Helpdesk

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
IBM storage revenues sink: 'We are disappointed,' says CEO
Time to put the storage biz up for sale?
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Symantec backs out of Backup Exec: Plans to can appliance in Jan
Will still provide support to existing customers
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.