Feeds

Google's 'post holiday' Caffeine shot still brewing

Worldwide roll-out in 'coming months'

Combat fraud and increase customer satisfaction

The web is still waiting for the worldwide roll-out of Google's next-generation search infrastructure, the mysterious indexing system overhaul known as "Caffeine."

A recent Wired profile of Google's search team indicates that Caffeine has already been deployed. But it seems the technology is still limited to a single data center, and though Google had planned to roll it out to other facilities after the New Year, this has yet to happen.

According to Search Engine Land, a Google spokesperson says that Caffeine will roll out across the company's global network of data centers "over the coming months." Previously, über-Googler Matt Cutts had indicated that Caffeine would be rolled out to multiple data centers "after the holidays," meaning after first of the year. And we're now two months on from January 1.

In early November, after testing Caffeine in a public sandbox for several weeks, Cutts indicated the platform would soon be rolled out to a single data center for use on the company's live search engine and that the company would follow suit with other data centers in a matter of weeks.

"Caffeine will go live at one data center so that we can continue to collect data and improve the technology, but I don’t expect Caffeine to go live at additional data centers until after the holidays are over," Cutts wrote on November 10. "Most searchers wouldn’t immediately notice any changes with Caffeine, but going slowly not only gives us time to collect feedback and improve, but will also minimize the stress on webmasters during the holidays."

Google did not immediately respond to our requests for comment. But that Google spokesperson tells Search Engine Land that the company expects to "roll [Caffeine] out to all data centers over the coming months." The company operates roughly 36 custom-built data centers across the globe.

"We run lots of tests with this big a change [sic] to our infrastructure,” the spokesperson says. “We want the new system to meet or exceed the abilities of our current system, and it can take time to ensure that everything looks good.”

It should be noted that Cutts never gave an exact date for the roll-out. He merely said it would not happen until after the holidays and - subsequently - "until at least January."

Caffeine continues to run in that single data center. In late November, according to Search Engine Roundtable, Cutts said that the the Google IP address 209.85.225.103 was hitting that single Caffeinated data center 50 per cent of the time, and it appears Google search-engine IPs are still mapping to the same data center.

"The data center remains the same,” the Google spokesperson tells Search Engine Land, “but different IP addresses can map to different data centers at different times due to how Google manages its traffic. Because of how Google employs custom load-balancing, there is not a single IP address that will always reach the Caffeine data center.”

Cutts first unveiled Caffeine - at least partially - in August with a post to the official Google Webmaster Central blog, calling it a "secret project" to build the "next-generation architecture for Google's web search," before pointing users to a sandbox where they could test it. Speaking with The Reg days later, he called it "a fundamental re-architecting" of Google's search indexing system.

"It's larger than a revamp," he told us. "It's more along the lines of a rewrite. And it's really great. It gives us a lot more flexibility, a lot more power. The ability to index more documents. Indexing speeds - that is, how quickly you can put a document through our indexing system and make it searchable - is much, much better."

This is not a change to Google's search philosophy. It's not a change to its famous search algorithms. It's a change to the way the company builds its index of all known websites and the metadata needed to describe them - the index that the algorithms rely on. "The new infrastructure sits 'under the hood' of Google's search engine," read Cutts' original blog post, "which means that most users won't notice a difference in search results."

After interviews with Google's search team, Wired's Steve Levy described Caffeine as something that makes it even easier for engineers to add "signals" - i.e. "contextual clues that help the search engine rank the millions of possible results to any query, ensuring that the most useful ones float to the top."

Cutts confirmed with The Reg that as we had reported earlier, Caffeine includes an overhaul of the company's distributed Google File System, or GFS. A technology two years in the making, the so-called GFS2 is a significant departure from the original Google File System that debuted almost ten years ago and now drives services across the Google empire.

With GFS, a master node oversees data that's spread across a series of distributed chunkservers, - architecture that's not exactly suited to apps that require low latency, such as YouTube and Gmail. That lone master is a single point of failure. To solve this problem, GFS2 uses not only distributed slaves, but distributed masters as well.

Cutts also said that Caffeine uses other back-end technologies recently developed by the company, but he declined to name them. He indicated that these did not include updates to MapReduce, Google's distributed number crunching platform, or BigTable, its distributed database.

Whatever new infrastructure technologies underpin Caffeine, they have not been deployed across other Google services. But Cutts indicated that Google hopes to do so with at least some of them. Google's distributed global infrastructure is designed to operate a like a single machine, running all its online services. Certainly, GFS2 will be deployed across the Googlenet. ®

Combat fraud and increase customer satisfaction

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Kingston DataTraveler MicroDuo: Turn your phone into a 72GB beast
USB-usiness in the front, micro-USB party in the back
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
BOFH: Oh DO tell us what you think. *CLICK*
$%%&amp Oh dear, we've been cut *CLICK* Well hello *CLICK* You're breaking up...
AMD's 'Seattle' 64-bit ARM server chips now sampling, set to launch in late 2014
But they won't appear in SeaMicro Fabric Compute Systems anytime soon
Amazon reveals its Google-killing 'R3' server instances
A mega-memory instance that never forgets
Cisco reps flog Whiptail's Invicta arrays against EMC and Pure
Storage reseller report reveals who's selling what
Microsoft builds teleporter weapon to send VMware into Azure
Updated Virtual Machine Converter now converts Linux VMs too
prev story

Whitepapers

Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.