Google's Urs Hölzle: If you're not breaking your own gear, you aren't ambitious enough

Infrastructure king on next-gen memory, FPGAs, and more

Boost IT visibility and business value

Interview In the past fifteen years Google has gone from being a consumer of tech to an inventor of technologies, and in doing so has had profound effects on the modern web.

One of the key people behind that shift has been Urs Hölzle, who joined the company as its eighth employee and now serves as a senior vice president of technical infrastructure and one of its "Google Fellows".

One of Hölzle's main jobs is to plan out the technologies Google needs to use, how it needs to use them, and what paths it absolutely shouldn't go down.

At the GigaOm Structure conference in San Francisco, he sat down with The Register for an interview about what Google thinks of next-gen memory technologies, whether lashing FPGAs to CPUs is a good idea, how distributed systems needs to be run and managed, and which aspects of Google's own for-sale cloud services can benefit from the company's internal infrastructure.

What follows is a transcript of that conversation that has been edited for clarity and brevity.

How far out does Google need to look when it comes to the types of hardware components you contemplate buying in a few years? Specifically, what do you think about next-generation memory technologies?

The things you focus most of your time on are nine months out, like this is your next generation that you're right now developing and then there's what we call n-plus-one, that's usually where you work concurrently on the thing after that that so you already have concrete prototypes or whatever but it's not ready, the silicon isn't really available, so you try things out.

PCM or memristors or whatever is what you have testbeds or simulations for, but you have no comprehension of timeline because you don't know when they're going to be available.

There's a number of these in the air - silicon photonics, different kinds of storage - and I think the way we look at it is you have to be prepared, you have to play with these things to understand what they look like.

You can't really anticipate. Memristors, three years ago, were being announced as nine months in the future, and now they're due 2017 or maybe end of decade, so, you know, TBD.

The other thing is often to really take advantage of a new technology, you need to have it at least partially available because you need to go and say 'how would I rewrite search' in order to use this. If you just have a simulation, it's a billion times slower than the real thing, so there's only so much you can do in figuring out 'what would you do if'. The truth is normally something like 18 months or 24 months is enough to get it done by the time the thing is actually production ready.

There's a tension between centralizing the systems providing features, and distributing them across your infrastructure so you can be flexible at the cost of speed. How does Google decide where it needs to be on that difference?

The key thing is that you can't be religious about it. Things change, and I think in the next five years there's at least the chance that technology will change much more meaningfully than they have in the past five years.

Exactly how that works out, that really depends on the specific factors. If something improves [in performance] by a factor of two or factor of eight, that really changes how you react to it.

The important thing is you don't get too set on one approach. Disaggregation is a great thing but it's not the only thing to pay attention to: there may very well be times when disaggregation is much less important than something else. For example, maybe you want to package things more closely at some point.

You and some others at Google came up with the idea that 'the data center is the computer'. What are some of the implications of treating DCs in that way, and have you run into any problems that you didn't anticipate?

You always run into problems. We've really, for the last ten years, easily been at the forefront of trying to solve these problems and you get things wrong all the time. That comes with the territory. If it doesn't, you probably didn't try something ambitious enough. One of the big advantages of software is that it's much more malleable. You can go full hog in the wrong direction, and once you realize it, changing direction isn't that expensive and it doesn't take that much time.

With hardware, you have sunk cost, you have this thing, and you've built it and you've spent the money on it and you can't really refurbish it or change it very much. So the more flexibility you put in, and the more control is outside the box, the easier it is to react to new demands.

We've been big supporters of [networking protocol] OpenFlow, for example. On a traditional networking box you have millions of lines of software in it. On OpenFlow you have thousands of lines of software in it - really just enough to control the box and the fans and program the chips, but really all the intelligence is elsewhere, and that allows you to change them.

Like, you have a new routing [scheme] and it works for multiple boxes because the box never knew what routing was so therefore you don't have to update it. [The boxes are] really focused on the hardware design; someone else tells them how to program their hardware tables, et cetera. The boxes don't really know that they're implementing VPN or some routing. That's number one.

Number two is the larger your pool is, or the more you think about things as pooled resources, the easier it is to be flexible about how that's being used.

When we think about memory, what's the right ratio of memory to CPUs, it's much easier if you can think about the pool. Like, here's a cluster, do I have enough memory in the cluster as a pool and if not, I don't need to upgrade every single machine – I need to add enough memory to the pool and then the cluster management system can figure it out and put the high memory jobs on the high memory machines.

It's much easier to evolve things that way than to say 'wow, actually I thought I need 16 gigs [of RAM] and now I realize I need 19 gigs and I have to go into every machine and put in 3 gigs ... oh wait I can't put in 3 gigs, the minimum increment is 8, and then I'm going to throw out the existing DIMM slot I have because there's only so many slots.'

If you think just about the box, that gets very awkward over a three-year timeframe in a field like ours where the requirements change all the time, and applications change all the time. By managing things in software you get more flexibility, and by pooling things you get more flexibility as well.

Boost IT visibility and business value

More from The Register

next story
Pay to play: The hidden cost of software defined everything
Enter credit card details if you want that system you bought to actually be useful
HP busts out new ProLiant Gen9 servers
Think those are cool? Wait till you get a load of our racks
Shoot-em-up: Sony Online Entertainment hit by 'large scale DDoS attack'
Games disrupted as firm struggles to control network
Community chest: Storage firms need to pay open-source debts
Samba implementation? Time to get some devs on the job
Like condoms, data now comes in big and HUGE sizes
Linux Foundation lights a fire under storage devs with new conference
Silicon Valley jolted by magnitude 6.1 quake – its biggest in 25 years
Did the earth move for you at VMworld – oh, OK. It just did. A lot
prev story


Gartner critical capabilities for enterprise endpoint backup
Learn why inSync received the highest overall rating from Druva and is the top choice for the mobile workforce.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.