Original URL: http://www.theregister.co.uk/2008/11/26/patterson_keynote_sc08/

RISC daddy conjures Moore's Lawless parallel universe

The 64-core laptop

By Timothy Prickett Morgan

Posted in Servers, 26th November 2008 07:38 GMT

The oft-cited Moore's Law is the fulcrum of the IT industry in that it has provided the means of giving us ever-faster and more sophisticated computing technology over the decades. This in turn allowed the IT industry to convince us that every one, two, or three years, we need new operating systems, better performance, and new and more complex applications. But ask yourself this: What happens to the IT industry if the performance improvements stop?

That is the question that one of the luminaries of the computer industry, David Patterson, posed last week with his keynote address at the SC08 supercomputing event in Austin, Texas. Patterson is one of the most important thinkers in computer science, and when he says there's a problem, people listen.

If you don't know who Patterson is, you know some of his work. At the University of California at Berkeley, where Patterson has been a member of the computer science faculty since 1977, he lead the design and implementation of the RISC I, what some have called the very first VLSI RISC computers and the foundation of what would eventually become Sun Microsystems' Sparc processor. Patterson was also the leader on another storage product called the Redundant Arrays of Inexpensive Disks (RAID) project, which made cheap PC-style disks look like more reliable mainframe-class disks in terms of reliability and capacity. RAID disks of various stripes are the norm in storage today.

These days, Patterson is still at Berkeley, and he runs the Parallel Computing Laboratory that is funded largely by Intel and Microsoft - Par Lab for short. As the name suggests, the lab is trying to tackle the parallel computing problem in new ways. Both corporate and consumer computing today is wrestling with this parallelism problem, right there in the data center and on the desktop, a problem that has plagued supercomputing for decades. Specifically, we have been trying to make many relatively slow computers do the work that would be done by a single, large, and imaginary computer. Yes, imaginary. The laws of physics (and particularly thermodynamics) don't allow you to build it.

In the old days of computing - which was only a few years ago - everyone expected that the ever-shrinking transistor would just enable faster and faster processors, thereby allowing single-threaded applications to run faster and faster. "This is an example of faith-based science," Patterson quipped in his opening, and he reminded everyone that he was among the people just a few years ago who just assumed that the chip-making processes would be available so chips could crank up the clocks and still be in a 100 watt thermal envelope. He showed what the chip roadmap looked like from the early 2000s looking ahead into 2005 and then how this was revamped:

Chip Roadmaps

As you can see, only as far back as 2005, the expectation was for a chip well above 20 GHz by 2013. And a few years later, the expectation shifted to possibly having chips at 5 GHz by 2007 and reaching up towards 8 GHz or so. Take a look at the actual Intel multicore line on the chart. We are stuck at around 3 GHz with x64 processors, and all that Moore's Law is getting us is more cores on a die with each passing year.

Single thread performance is stalled, with some exceptions here and there. We have hit a wall on thermals and it just wasn't practical to ramp up clock speeds any more. And so, we started cookie-cutting cores onto dies. First two, then four or more. And this is how we have been using Moore's Law to boost the performance inside a single CPU socket. But we are quite possibly engaging in more faith-based science, according to Patterson.

"The whole industry is assuming that we will solve this problem that a lot of people have broken their picks on." To make his point, Patterson told the nerds at SC08 to imagine a 32-core laptop and then an upgrade to a 64-core laptop, and then asked if they thought it would do more work on the kinds of workloads that run on a laptop. "We'd all bet against that. But that is the bet this industry has made."

Why Make the Bet?

So why make the bet? We don't have much of a choice. No one can build a faster computer (meaning higher clock speeds) to boost performance. And while there are shared computing infrastructures that allow many people to share the performance inherent in multicore processors (even if their slices don't run applications particularly faster), that is not really solving it. (It may be all we get, though.)

Perhaps more importantly, said Patterson, everyone has been forced into trying to crack the parallelism problem, and the courage people get from this hard, cold fact will spur innovation.

Rather then take on parallelism at the system level, the Par Lab - established two years ago - is focusing on parallelism inside individual processors and system-on-a-chip designs. This is where the laptop question above will be answered, after all. And the project has a goal of designing the programming methods that produce efficient and portable software that can run on 100 or more cores and can scale as the core counts in single-chip machines double ever two years.

Solving this particular variant of the parallelism problem is going to take some mind shifting across the IT industry. First, we must stop asking the question about who will need a 100-core processor to run Microsoft Word. Patterson said that while he is a Word user - and he likes Word - the question irks him.

"Questions like that make me think we have failed as educators," Patterson said. The real issue has nothing to do with supporting legacy applications. "I am pretty sure that the best software has not been written yet," Patterson said. And he gave a few examples of neat projects that could eat up a lot of parallel processing capacity in a single system.

The first was a loudspeaker array using 120 tweeters to create a 3D sound system, which is a prototype that is actually running at Berkeley's Center for New Music and Audio Technology. A similar use of the signal processing technology created through software on a parallel chip could be used as a hearing augmenter for laptops and handheld devices. Take it a step further and put some facial recognition software on it, creating something Patterson called a "name whisperer." This device would tell you who is coming up to you to talk and why you might care based on an archive of conversations you have had.

Another might be a content-based image retrieval system, which has a database of thousands of images with software that can search the images, not textual tags affiliated with the images. Or how about a little thing called the meeting diarist? This would be a laptop or handheld that would record audio and video of meetings and actually do the transcriptions of the meeting for you, and maybe even facilitate the exchange of data with people at the meeting. There is even a parallel Web browser in the works, designed from the ground up to really take advantage of all those cores. It uses something called SkipJax, a parallel replacement for JavaScript and AJAX.

The New Parallel Paradigm

Another mind shift that the IT industry is going to have to undergo is a change in the way we think about programming. Patterson and his team believe that there should be two layers of software in this new parallel paradigm, one he called the efficiency layer and the other he called the productivity layer. The efficiency layer would be comprised of about 10 per cent of programmers, the experts at creating frameworks and libraries, the people who can get down close to the metal and wring out efficiencies in code. The remaining 90 per cent of programmers would work in the productivity layer, and they would be domain experts in particular fields or industries, who take the frameworks and libraries and turn them into applications.

Now here's the neat bit. To help make parallel programming easier, Par Lab's experts want to take advantage of parallelism itself and create "auto-tuners" that run lots of different optimizations on code as it is compiled and heuristically search for the best version of the compiled code to run on a particular piece of hardware. Patterson said that in early tests, an auto-tuner capable of machine learning was about 4,000 times faster than an expert at tuning the code - and tuning for parallel architectures is the big problem with those architectures.

There are a lot more challenges that the industry faces in coping with parallelism, and one of them might just be an explosion of custom-made processors, FPGAs, and other computing elements that get woven together into future systems that do not look like the relatively simple devices we called personal computers or servers a few years ago.

Patterson is also espousing that processors and the other elements of systems have standardized methods of gathering information on power and performance to feed back into the programming tools, so efficiency programmers can figure out why the system isn't using all of its available memory bandwidth or productivity programmers can do what-if analysis on what happens to thermals or performance in the system if they change their code.

"There was a decade or so where we were polishing a pretty round stone," Patterson explained. "Going forward, the field is really wide open, but research really has to deliver on this. The IT industry is really going to have to deliver on doubling the core count every year and on getting value out of that."

Either that or the software business collapses and a whole lot of IT jobs go out the window as the industry shifts from a growth market, where we all have software driving us to upgrade to faster (well, more capacious) systems to a replacement one where we just get a new one when the old one breaks. ®