HPC battle royale: Exotic models vs Frankenstein monsters
Who will win the exascale supercomputer's heart?
HPC blog My article comparing supercomputer performance and price/performance to common computers generated quite a few comments. For those who didn’t see the initial story, the Fujitsu K computer is a 10 petaflop monster that’s currently the fastest computer in the world. It’s roughly 4x faster than the second place Tianhe-1A Chinese system that topped the chart at the end of 2010.
Most of the comments on the home computer vs supercomputer article were the typical mix of humour, flames and thoughtful asides, but one in particular caught my eye. It was from “buzza,” who mused:
The K machine is mighty pricey, and it would (be) interesting to see how that cost breaks down into CPU vs I/O development. The K machine has a very elaborate interconnect. This must surely take a lot of the credit for the machine's sustained performance being so close to the theoretical peak performance. The cost break down might illustrate where investment pays off best.
The K computer delivers incredible performance but also an equally incredible price tag: at $1.25bn to build and $10m annually to operate.
For comparison purposes, the IBM Roadrunner (the first 1 PFLOP system) cost about $100m back in 2008. So from Roadrunner to K computer, we saw both performance and costs move up an order of magnitude. Fair enough.
Much of the cost behind the K computer was in designing the system innards, primarily the proprietary interconnect and surrounding bits. It was the same thing with Roadrunner; much of the development time/money was spent working out how to get Opteron and PowerXCell accelerators (closely related to the Cell BE chip in PlayStation consoles) to work well together in the same system.
Both of these systems, aside from being the first to cross performance hurdles, are departures from the conventional systems that populate most of the Top500 list and most HPC data centres.
Supercomputers used to be highly customised systems that were essentially built from scratch and shared few, if any, common components (other than copper and electrons). The hardware and operating systems were unique to a particular vendor and even machine type.
All of this changed in the 1990s when increasing HPC demand combined with a number of other factors (including the rise of Linux and the falling cost of commodity parts) to bring about what is now the typical supercomputer: a collection of individually inexpensive common parts that are lashed together to build a massive single cluster or MPP system.
In a lot of ways, this was a grassroots effort fueled by customer desire to get more FLOP/s per dollar, aided by their willingness to roll their own system software and reengineer their apps.
Over time, as the commodity movement picked up steam, it was embraced by existing and new vendors. Building supercomputers out of commonly available parts opened up the industry to lots of new players who were able to build competitive systems (in price and price/performance terms) by “simply” combining commodity parts together.
They’re able to stay on the performance curve by taking advantage of steady gains from processors (ala Moore’s Law) and networking/interconnect technologies. This isn’t to say that building a commodity-based supercomputer is now simple – but it’s a lot easier than having to design and build all of the major components yourself.
The K computer wasn’t built in this mold. It uses SPARC processors, not Intel or AMD procs. While SPARC is a widely used processor, it hasn’t been widely used in HPC since the early 2000s. The K computer team built their own highly sophisticated 6D torus (like there’s a non-sophisticated 6D torus, right?) to connect nodes together, eschewing the typical Infiniband or network-based interconnect. It’s also unique in what it doesn’t use: accelerators (either GPUs or FPGAs). The K computer relies on lots and lots of traditional CPUs, with more than 700,000 cores total.
K isn’t the only throwback system on the Top500. In addition to the aforementioned Roadrunner (which still comes in at #10), there are plenty of top systems that aren’t fueled by x86 processors, including the 14th fastest system in the world: the 800 TF Sunway Blue Light system, which relies on 16-core ShenWei RISC processors.
In terms of system count, almost 90 per cent of the systems on the current Top500 list are based on x86 processors from AMD or Intel. But the 13 per cent of systems that aren’t x86-based pack quite a punch, accounting for 27 per cent of the total performance (as measured by a sum of Rmax ratings).
Many industry watchers, myself included at times, figured that the commodity model would swamp the custom system model sooner or later. While that’s mostly happened, the K computer (along with Roadrunner, Blue Light, and, arguably, the CPU/GPU hybrids like Tianhe) have fought against that tide and made quite a splash, at least in the deepest part of the deep computing pool. (And I’ve just hit my personal best for reuse of the same metaphor – a high-water mark for me!)
So what does this bode for the future? Will commodity rule the roost, or will we see a new crop of custom systems using exotic combinations of at least semi-proprietary parts? I think that the move to exascale is going to require different approaches and technologies – we’re not going to get there by just shrinking and cranking up the frequency of existing parts. This is a situation where turning up the amps to ‘10’ won’t get it done; exascale is going to demand ‘11’. ®