What should replace Linpack for ranking supercomputers?

Our Big Iron man has a few ideas following Top500

Internet Security Threat Report 2014

There is always an immediate predecessor, so don't try to tell me there isn't

This strikes me as a software problem, not a hardware problem: and not only that, one that doesn't have any particular bearing on the relevance of Linpack. And even Heroux and Dongarra basically admit as much in the next paragraph of the HPCG proposal:

"Of course, one of the important research efforts in HPC today is to design applications such that more computations are Type 1 patterns, and we will see progress in the coming years. At the same time, most applications will always have some Type 2 patterns and our benchmarks must reflect this reality. In fact, a system's ability to effectively address Type 2 patterns is an important indicator of system balance."

What this server hack really wants to know is how the initial Oak Ridge application set is performing on Titan, how much work needs to be done to optimize the code and actually use those GPU accelerators that the US taxpayers (like me) have shelled out for.

I want something even more devious than that, too. I want to see the Top500 list completely reorganized, using its historical data in a more useful fashion. I want the list to show a machine and its immediate predecessor. There is always an immediate predecessor, so don't try to tell me there isn't, and it is running a key real-world application, too. So, for these two machines, I want three numbers: Peak theoretical flops, sustained Linpack flops, and relative performance of the key workload (you won't be able to get actual performance data for this, of course.)

I want the Top500 table to show the relative performance gains for all three across the two machines. And predecessor machines don't necessarily have to be in the same physical location, either. The "Kraken" super at the University of Tennessee runs code for NOAA, which also has its own machines.

So, for instance, it might show that Titan has a peak theoretical performance of 27.12 petaflops across its CPUs and GPUs, and running the Linpack test and loading up both computing elements it was able to deliver 17.59 petaflops of oomph in an 8.21 megawatt power envelope. Importantly, the Titan machine has 560,640 Opteron cores running at 2.2GHz. The "Jaguar" super that predates Titan at Oak Ridge (technically, Jaguar was upgraded to Titan in two steps) was an all-CPU machine with 298,592 cores running at 2.6GHz that had a peak theoretical performance of 2.63 petaflops and a sustained Linpack performance of 1.94 petaflops, all in a 5.14 megawatt power envelope.

So the delta on peak performance moving from the last iteration of Jaguar (it had a processor and interconnect upgrade a little more than a year ago before adding the GPUs to make it a Titan late last year) was a factor of 10.31, but on sustained performance for the Linpack test, the delta between the two machines was only a factor of 9.06. And, by the way, that was only because a lot of the calculations were done on the GPUs, and even then, Titan still only had a computational efficiency across those CPUs and GPUs of 64.9 per cent. It may be that it just cannot be pushed much higher than that. The best machines on the list might have an 85 to 90 per cent computational efficiency.

But here's the fun bit. If you just look at the CPU portions of the Jaguar and Titan machines, then the aggregate floating point delta moving from Jaguar to Titan was a mere 58.9 per cent. In other words, if a key Oak Ridge application could, in theory, be deployed across nearly twice as many cores and a faster interconnect and make use of them all perfectly efficiently and did not offload any calculations to the GPU, then you get a speedup of about a factor of 1.6. If you offload some calculations to the Tesla GPUs - but only some, as Heroux and Dongarra said was the case - then maybe, and El Reg is guessing wildly here - maybe you could get that delta up to a factor of 2X, 3X, or even 4X.

The point is to not guess, but for Oak Ridge to say as part of a Top500 submission. Why are we all guessing? Moreover, why not have Top500 ranked by the familiar Linpack, add a HPCG column, then one key real-world relative performance metric, and do the before and after machines? More data is better.

There are some other things that a revised supercomputer benchmark test needs as well, since we are mulling it over. Every machine has a list price, and every benchmark run should have one. Every machine has a measured wall power when it is running, and every benchmark run should have one. The Top500 list is fun, but it needs to be more useful, and the reality is that supercomputing is more constrained by the power envelope and the budget than any other factors, and this data has to be brought into the equation. Even estimated street prices are better than nothing, and I am just annoyed enough to gin them up myself and get into a whole lot of trouble.

Another thing to consider is that the needs of the very high end of the supercomputer space are far different from the needs of the rest of the HPC community.

At the high end, research labs and HPC centers are writing their own code, but smaller HPC installations are running other people's code. The addition of a real-world workload is useful for such customers in particular if it is an off-the-shelf application. What would be truly useful is a set of clusters of varying sizes and technologies running the ten top HPC applications - perhaps one for each industry - with all the same before and after configurations and deltas as machines are upgraded in the field. I think what you will find is that a lot of commercial applications can't scale as far as you might hope across ever-increasing cluster sizes, and software developers in the HPC racket would be extremely adverse to such odious comparisons. It would be nice to be surprised and see third-party apps scale well, so go ahead, I dare you - surprise me.

The HPC industry needs a complete set of benchmark test results that show a wide variety of components being mixed and matched in different patterns and showing the effects on application performance. We don't need 50 different Linpack test results on clusters with slightly different configurations, but results on as many diverse systems as we can find and test. This is how Dongarra originally compiled his Linpack benchmarks, and it was the right way to do it. ®

Beginner's guide to SSL certificates

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
'Urika': Cray unveils new 1,500-core big data crunching monster
6TB of DRAM, 38TB of SSD flash and 120TB of disk storage
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
SDI wars: WTF is software defined infrastructure?
This time we play for ALL the marbles
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Oracle hires former SAP exec for cloudy push
'We know Larry said cloud was gibberish, and insane, and idiotic, but...'
Symantec backs out of Backup Exec: Plans to can appliance in Jan
Will still provide support to existing customers
prev story


Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
Win a year’s supply of chocolate
There is no techie angle to this competition so we're not going to pretend there is, but everyone loves chocolate so who cares.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.