Feeds

What should replace Linpack for ranking supercomputers?

Our Big Iron man has a few ideas following Top500

Maximizing your infrastructure through virtualization

There is always an immediate predecessor, so don't try to tell me there isn't

This strikes me as a software problem, not a hardware problem: and not only that, one that doesn't have any particular bearing on the relevance of Linpack. And even Heroux and Dongarra basically admit as much in the next paragraph of the HPCG proposal:

"Of course, one of the important research efforts in HPC today is to design applications such that more computations are Type 1 patterns, and we will see progress in the coming years. At the same time, most applications will always have some Type 2 patterns and our benchmarks must reflect this reality. In fact, a system's ability to effectively address Type 2 patterns is an important indicator of system balance."

What this server hack really wants to know is how the initial Oak Ridge application set is performing on Titan, how much work needs to be done to optimize the code and actually use those GPU accelerators that the US taxpayers (like me) have shelled out for.

I want something even more devious than that, too. I want to see the Top500 list completely reorganized, using its historical data in a more useful fashion. I want the list to show a machine and its immediate predecessor. There is always an immediate predecessor, so don't try to tell me there isn't, and it is running a key real-world application, too. So, for these two machines, I want three numbers: Peak theoretical flops, sustained Linpack flops, and relative performance of the key workload (you won't be able to get actual performance data for this, of course.)

I want the Top500 table to show the relative performance gains for all three across the two machines. And predecessor machines don't necessarily have to be in the same physical location, either. The "Kraken" super at the University of Tennessee runs code for NOAA, which also has its own machines.

So, for instance, it might show that Titan has a peak theoretical performance of 27.12 petaflops across its CPUs and GPUs, and running the Linpack test and loading up both computing elements it was able to deliver 17.59 petaflops of oomph in an 8.21 megawatt power envelope. Importantly, the Titan machine has 560,640 Opteron cores running at 2.2GHz. The "Jaguar" super that predates Titan at Oak Ridge (technically, Jaguar was upgraded to Titan in two steps) was an all-CPU machine with 298,592 cores running at 2.6GHz that had a peak theoretical performance of 2.63 petaflops and a sustained Linpack performance of 1.94 petaflops, all in a 5.14 megawatt power envelope.

So the delta on peak performance moving from the last iteration of Jaguar (it had a processor and interconnect upgrade a little more than a year ago before adding the GPUs to make it a Titan late last year) was a factor of 10.31, but on sustained performance for the Linpack test, the delta between the two machines was only a factor of 9.06. And, by the way, that was only because a lot of the calculations were done on the GPUs, and even then, Titan still only had a computational efficiency across those CPUs and GPUs of 64.9 per cent. It may be that it just cannot be pushed much higher than that. The best machines on the list might have an 85 to 90 per cent computational efficiency.

But here's the fun bit. If you just look at the CPU portions of the Jaguar and Titan machines, then the aggregate floating point delta moving from Jaguar to Titan was a mere 58.9 per cent. In other words, if a key Oak Ridge application could, in theory, be deployed across nearly twice as many cores and a faster interconnect and make use of them all perfectly efficiently and did not offload any calculations to the GPU, then you get a speedup of about a factor of 1.6. If you offload some calculations to the Tesla GPUs - but only some, as Heroux and Dongarra said was the case - then maybe, and El Reg is guessing wildly here - maybe you could get that delta up to a factor of 2X, 3X, or even 4X.

The point is to not guess, but for Oak Ridge to say as part of a Top500 submission. Why are we all guessing? Moreover, why not have Top500 ranked by the familiar Linpack, add a HPCG column, then one key real-world relative performance metric, and do the before and after machines? More data is better.

There are some other things that a revised supercomputer benchmark test needs as well, since we are mulling it over. Every machine has a list price, and every benchmark run should have one. Every machine has a measured wall power when it is running, and every benchmark run should have one. The Top500 list is fun, but it needs to be more useful, and the reality is that supercomputing is more constrained by the power envelope and the budget than any other factors, and this data has to be brought into the equation. Even estimated street prices are better than nothing, and I am just annoyed enough to gin them up myself and get into a whole lot of trouble.

Another thing to consider is that the needs of the very high end of the supercomputer space are far different from the needs of the rest of the HPC community.

At the high end, research labs and HPC centers are writing their own code, but smaller HPC installations are running other people's code. The addition of a real-world workload is useful for such customers in particular if it is an off-the-shelf application. What would be truly useful is a set of clusters of varying sizes and technologies running the ten top HPC applications - perhaps one for each industry - with all the same before and after configurations and deltas as machines are upgraded in the field. I think what you will find is that a lot of commercial applications can't scale as far as you might hope across ever-increasing cluster sizes, and software developers in the HPC racket would be extremely adverse to such odious comparisons. It would be nice to be surprised and see third-party apps scale well, so go ahead, I dare you - surprise me.

The HPC industry needs a complete set of benchmark test results that show a wide variety of components being mixed and matched in different patterns and showing the effects on application performance. We don't need 50 different Linpack test results on clusters with slightly different configurations, but results on as many diverse systems as we can find and test. This is how Dongarra originally compiled his Linpack benchmarks, and it was the right way to do it. ®

The Power of One eBook: Top reasons to choose HP BladeSystem

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
The triumph of VVOL: Everyone's jumping into bed with VMware
'Bandwagon'? Yes, we're on it and so what, say big dogs
Carbon tax repeal won't see data centre operators cut prices
Rackspace says electricity isn't a major cost, Equinix promises 'no levy'
Disaster Recovery upstart joins DR 'as a service' gang
Quorum joins the aaS crowd with DRaaS offering
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Application security programs and practises
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Securing Web Applications Made Simple and Scalable
Learn how automated security testing can provide a simple and scalable way to protect your web applications.