Google Research: Three things that MUST BE DONE to save the data center of the future

Think data-center design is tough now? Just you wait

  • alert
  • submit to reddit

HP ProLiant Gen8: Integrated lifecycle automation

'We suck at microseconds'

The third of the three challenges that Barroso identified as hampering the development of highly responsive, massively scaled data center is perhaps a bit counterintuitive: microsecond computing.

Addressing his audience of ISSCC chip designers, he said "You guys here in this room are the gods of nanosecond computing – or maybe picosecond computing." Over the years, a whole raft of techniques have been developed to deal with latencies in the nanosecond range.

But there remain a number of problems with latencies of much longer periods of time: microsecond latencies. The internet or disk drives, for example, induce latencies at the millisecond level, and so far the industry has been able to deal with these latencies by providing context-switching at the five-to-seven microsecond level. No problem, really.

However, Barroso said, "Here we are today, and I would propose that most of the interesting devices that we deal with in our 'landheld' computers are not in the nanosecond level, they're not in the millisecond level – they're in the microsecond level. And we suck at microseconds."

From his point of view the reason that the industry sucks at handling microsecond latencies is simply because it hasn't been paying attention to them while they've been focused on the nanosecond and millisecond levels of latencies.

As an example of a microsecond latency in a mega–datacenter, he gave the example of the data center itself. "Think about it. Two machines communicating in one of these large facilities. If the fiber has to go 200 meters or so, you have a microsecond." Add switching to that, and you have a bit more than a microsecond.

Multiply those microsecond latencies by the enormous amount of communications among machines and switches, and you're talking a large aggregate sum. In regard to flash storage – not to mention the higher-speed, denser non-volatile memory technologies of the future – you're going to see more microsecond latencies that need to be dealt with in the mega–data center.

The hardware-software solution

"Where this breaks down," Barroso said, "is when people today at Google and other companies try to build very efficient, say, messaging systems to deal with microsecond-level latencies in data centers."

The problem today is that when programmers want to send a call to a system one microsecond away and have it respond with data that takes another microsecond to return, they use remote procedure call libraries or messaging libraries when they want to, say, perform an RDMA call in a distributed system rather than use a direct RDMA operation.

When using such a library, he said, "Those two microseconds quickly went to almost a hundred microseconds." Admittedly, some of that problem is that software is often unnecessarily bloated, but Barroso says that the main reason is that "we don't have the underlying mechanisms that make it easy for programmers to deal with microsecond-level latencies."

This is a problem that will have to be dealt with both at the hardware and the software levels, he said – and by the industry promoting microsecond-level latencies to a first-order problem when designing systems for the mega–data center.

All three of these challenges are about creating highly scalable data centers that can accomplish the goal of "big data, little time" – but despite the fact that he was speaking at an ISSCC session during which all the other presenters spoke at length about the buzzword du jour, Barroso refused to be drawn into the hype-fest.

"I will not talk about 'big data' per se," he said. "Or to use the Google internal term for it: 'data'." ®

Reducing security risks from open source software

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
Amazon Reveals One Weird Trick: A Loss On Almost $20bn In Sales
Investors really hate it: Share price plunge as growth SLOWS in key AWS division
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
The triumph of VVOL: Everyone's jumping into bed with VMware
'Bandwagon'? Yes, we're on it and so what, say big dogs
Carbon tax repeal won't see data centre operators cut prices
Rackspace says electricity isn't a major cost, Equinix promises 'no levy'
prev story


Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Seven Steps to Software Security
Seven practical steps you can begin to take today to secure your applications and prevent the damages a successful cyber-attack can cause.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.