What should replace Linpack for ranking supercomputers?

Our Big Iron man has a few ideas following Top500

Beginner's guide to SSL certificates

There is always an immediate predecessor, so don't try to tell me there isn't

This strikes me as a software problem, not a hardware problem: and not only that, one that doesn't have any particular bearing on the relevance of Linpack. And even Heroux and Dongarra basically admit as much in the next paragraph of the HPCG proposal:

"Of course, one of the important research efforts in HPC today is to design applications such that more computations are Type 1 patterns, and we will see progress in the coming years. At the same time, most applications will always have some Type 2 patterns and our benchmarks must reflect this reality. In fact, a system's ability to effectively address Type 2 patterns is an important indicator of system balance."

What this server hack really wants to know is how the initial Oak Ridge application set is performing on Titan, how much work needs to be done to optimize the code and actually use those GPU accelerators that the US taxpayers (like me) have shelled out for.

I want something even more devious than that, too. I want to see the Top500 list completely reorganized, using its historical data in a more useful fashion. I want the list to show a machine and its immediate predecessor. There is always an immediate predecessor, so don't try to tell me there isn't, and it is running a key real-world application, too. So, for these two machines, I want three numbers: Peak theoretical flops, sustained Linpack flops, and relative performance of the key workload (you won't be able to get actual performance data for this, of course.)

I want the Top500 table to show the relative performance gains for all three across the two machines. And predecessor machines don't necessarily have to be in the same physical location, either. The "Kraken" super at the University of Tennessee runs code for NOAA, which also has its own machines.

So, for instance, it might show that Titan has a peak theoretical performance of 27.12 petaflops across its CPUs and GPUs, and running the Linpack test and loading up both computing elements it was able to deliver 17.59 petaflops of oomph in an 8.21 megawatt power envelope. Importantly, the Titan machine has 560,640 Opteron cores running at 2.2GHz. The "Jaguar" super that predates Titan at Oak Ridge (technically, Jaguar was upgraded to Titan in two steps) was an all-CPU machine with 298,592 cores running at 2.6GHz that had a peak theoretical performance of 2.63 petaflops and a sustained Linpack performance of 1.94 petaflops, all in a 5.14 megawatt power envelope.

So the delta on peak performance moving from the last iteration of Jaguar (it had a processor and interconnect upgrade a little more than a year ago before adding the GPUs to make it a Titan late last year) was a factor of 10.31, but on sustained performance for the Linpack test, the delta between the two machines was only a factor of 9.06. And, by the way, that was only because a lot of the calculations were done on the GPUs, and even then, Titan still only had a computational efficiency across those CPUs and GPUs of 64.9 per cent. It may be that it just cannot be pushed much higher than that. The best machines on the list might have an 85 to 90 per cent computational efficiency.

But here's the fun bit. If you just look at the CPU portions of the Jaguar and Titan machines, then the aggregate floating point delta moving from Jaguar to Titan was a mere 58.9 per cent. In other words, if a key Oak Ridge application could, in theory, be deployed across nearly twice as many cores and a faster interconnect and make use of them all perfectly efficiently and did not offload any calculations to the GPU, then you get a speedup of about a factor of 1.6. If you offload some calculations to the Tesla GPUs - but only some, as Heroux and Dongarra said was the case - then maybe, and El Reg is guessing wildly here - maybe you could get that delta up to a factor of 2X, 3X, or even 4X.

The point is to not guess, but for Oak Ridge to say as part of a Top500 submission. Why are we all guessing? Moreover, why not have Top500 ranked by the familiar Linpack, add a HPCG column, then one key real-world relative performance metric, and do the before and after machines? More data is better.

There are some other things that a revised supercomputer benchmark test needs as well, since we are mulling it over. Every machine has a list price, and every benchmark run should have one. Every machine has a measured wall power when it is running, and every benchmark run should have one. The Top500 list is fun, but it needs to be more useful, and the reality is that supercomputing is more constrained by the power envelope and the budget than any other factors, and this data has to be brought into the equation. Even estimated street prices are better than nothing, and I am just annoyed enough to gin them up myself and get into a whole lot of trouble.

Another thing to consider is that the needs of the very high end of the supercomputer space are far different from the needs of the rest of the HPC community.

At the high end, research labs and HPC centers are writing their own code, but smaller HPC installations are running other people's code. The addition of a real-world workload is useful for such customers in particular if it is an off-the-shelf application. What would be truly useful is a set of clusters of varying sizes and technologies running the ten top HPC applications - perhaps one for each industry - with all the same before and after configurations and deltas as machines are upgraded in the field. I think what you will find is that a lot of commercial applications can't scale as far as you might hope across ever-increasing cluster sizes, and software developers in the HPC racket would be extremely adverse to such odious comparisons. It would be nice to be surprised and see third-party apps scale well, so go ahead, I dare you - surprise me.

The HPC industry needs a complete set of benchmark test results that show a wide variety of components being mixed and matched in different patterns and showing the effects on application performance. We don't need 50 different Linpack test results on clusters with slightly different configurations, but results on as many diverse systems as we can find and test. This is how Dongarra originally compiled his Linpack benchmarks, and it was the right way to do it. ®

Intelligent flash storage arrays

More from The Register

next story
NSA SOURCE CODE LEAK: Information slurp tools to appear online
Now you can run your own intelligence agency
Fat fingered geo-block kept Aussies in the dark
NASA launches new climate model at SC14
75 days of supercomputing later ...
Yahoo! blames! MONSTER! email! OUTAGE! on! CUT! CABLE! bungle!
Weekend woe for BT as telco struggles to restore service
Cloud unicorns are extinct so DiData cloud mess was YOUR fault
Applications need to be built to handle TITSUP incidents
BOFH: WHERE did this 'fax-enabled' printer UPGRADE come from?
Don't worry about that cable, it's part of the config
Stop the IoT revolution! We need to figure out packet sizes first
Researchers test 802.15.4 and find we know nuh-think! about large scale sensor network ops
SanDisk vows: We'll have a 16TB SSD WHOPPER by 2016
Flash WORM has a serious use for archived photos and videos
Astro-boffins start opening universe simulation data
Got a supercomputer? Want to simulate a universe? Here you go
prev story


Seattle children’s accelerates Citrix login times by 500% with cross-tier insight
Seattle Children’s is a leading research hospital with a large and growing Citrix XenDesktop deployment. See how they used ExtraHop to accelerate launch times.
Why CIOs should rethink endpoint data protection in the age of mobility
Assessing trends in data protection, specifically with respect to mobile devices, BYOD, and remote employees.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Protecting against web application threats using SSL
SSL encryption can protect server‐to‐server communications, client devices, cloud resources, and other endpoints in order to help prevent the risk of data loss and losing customer trust.