What should replace Linpack for ranking supercomputers?

Our Big Iron man has a few ideas following Top500

High performance access to file storage

ISC 2013 The Linpack Fortran benchmark, which has been used to gauge the relative performance of workstations and supercomputers for many decades, is looking a little long in the tooth. So, some of the people who love Linpack and know it best are proposing a new benchmark - with the mind-numbing name of High Performance Conjugate Gradient, or HPCG.

All system benchmark tests run their courses, and the most successful ones usually overstay their welcome and stop doing what they are designed to do. That is, allow for the comparative ranking of systems in terms of performance across different architectures and software stacks in a meaningful way, and - if we are lucky - provide some insight into bang vs buck for different systems so that companies can weigh the relative economic merits of systems that can run a particular application.

Tests run on supercomputers mostly do the former, and rarely do the latter. That is a big problem that Michael Heroux, of Sandia National Laboratories, and Jack Dongarra, of both the University of Tennessee and Oak Ridge National Laboratory, tackled in their HPCG benchmark proposal paper, which you can read here (PDF).

Earlier this week, your friendly neighborhood El Reg server hack defended the Linpack test as the Top500 supercomputer rankings for June 2013 were announced at the International Super Computing conference in Leipzig, Germany. Linpack is important because it is fun to see the rankings twice a year, and it is an easy enough test to run that people actually do it with some enthusiasm on their machines.

Erich Strohmaier, one of the administrators of the Top500 who hails from Lawrence Berkeley National Laboratory, tells El Reg that there were around 800 submissions of Linpack reports for the June Top500 list, and a bunch of them are tossed out because there is something funky in them. Making it to the Top500 list is a big deal for a lot of supercomputer labs, and every national and district politician around the world loves to see a new machine come online in their sphere of influence for the photo-op to prove they are doing their jobs to protect our future. And moving down in the rankings is also a big deal, because if you are not moving ahead you are falling behind. Jysoo Lee, director of the KISTI supercomputing center in Korea found this out when Korea's relative rankings slipped in the June list. You get phone calls from people in power who are not pleased.

So Linpack matters and it doesn't matter at the same time. It is a bit like miles per gallon fuel efficiency ratings for cars, or if you live in New York, letter grade ratings for restaurants that the city mandates.

But when it comes to using Linpack as a relative performance metric, there are some issues. If you really want to hurt your head, delve into the HPCG paper put out by Heroux and Dongarra. The basic gist, as this server hack understands it, is that the kind of codes that were initially deployed on parallel clusters fifteen years ago bore more of a relationship to High Performance Linpack, or HPL, than they do in all cases today. (HPL is the parallel implementation of Linpack; earlier versions could only run on a single machine like a PC at the low end or a federated SMP server with shared memory or a vector processor at the high-end.) This is one of the reasons why the University of Illinois has refused to run Linpack on the "Blue Waters" ceepie-geepie built by Cray. (Remember, Linpack is voluntary. It is not a ranking of the Top500 supercomputers as much as it is a ranking of the Top500 supercomputers from organizations that want to brag.)

Here's the central problem as Heroux and Dongarra see it:

"At the same time HPL rankings of computer systems are no longer so strongly correlated to real application performance, especially for the broad set of HPC applications governed by differential equations, which tend to have much stronger needs for high bandwidth and low latency, and tend to access data using irregular patterns. In fact, we have reached a point where designing a system for good HPL performance can actually lead to design choices that are wrong for the real application mix, or add unnecessary components or complexity to the system."

There is a lot of deep technical description in the paper, but here is the really simplified version. Linpack is a set of calculations using matrix multiplication that scales up the size of the arrays to try to choke the machine and force it to reach its peak capacity across all of the computing elements in the cluster. The original Linpack from the dawn of time solved a dense matrix of 100 x 100 linear equations, then it moved to 1000 x 1000 as machines got more powerful, and then the Linpackers took the hood off and let the linear equation count scale and tweaked the HPL code to run across clusters and take advantage of modern networks and interconnects as well as coprocessors like Intel Xeon Phi and Nvidia Tesla cards.

The kind of calculations and data access patterns in the Linpack test are what Heroux and Dongarra refer to as Type 1. The machines are loaded up with lots of floating point processing and the data can be organized in such a way as to be particularly efficient at getting the right data to the right FP unit at the right time most of the time. With Type 2 patterns - which are more reflective of the kinds of differential equations increasingly used in simulations - the data access patterns are less regular and the calculations are finer-grained and recursive. Those shiny new Xeon Phi and Tesla accelerators are really good at Type 1 calculations and not so hot (yet) on Type 2 calculations.

The upshot, say Heroux and Dongarra, is that designing a system to reach one exaflops on Linpack might result in a machine that is absolutely terrible at running real world applications, thus defeating one of the primary purposes of a benchmark test. Benchmarks are a feedback loop into system designers, and are absolutely necessary. Here is an example of the divergence cited by Heroux and Dongarra on the "Titan" Opteron-Tesla ceepie-geepie down in Oak Ridge National Laboratories, with which they are intimately familiar.

Read this carefully:

"The Titan system at Oak Ridge National Laboratory has 18,688 nodes, each with a 16-core, 32 GB AMD Opteron processor and a 6GB Nvidia K20 GPU. Titan was the top-ranked system in November 2012 using HPL. However, in obtaining the HPL result on Titan, the Opteron processors played only a supporting role in the result. All floating-point computation and all data were resident on the GPUs. In contrast, real applications, when initially ported to Titan, will typically run solely on the CPUs and selectively off-load computations to the GPU for acceleration."

High performance access to file storage

More from The Register

next story
European Court of Justice rips up Data Retention Directive
Rules 'interfering' measure to be 'invalid'
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
Cisco reps flog Whiptail's Invicta arrays against EMC and Pure
Storage reseller report reveals who's selling what
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Bored with trading oil and gold? Why not flog some CLOUD servers?
Chicago Mercantile Exchange plans cloud spot exchange
Just what could be inside Dropbox's new 'Home For Life'?
Biz apps, messaging, photos, email, more storage – sorry, did you think there would be cake?
IT bods: How long does it take YOU to train up on new tech?
I'll leave my arrays to do the hard work, if you don't mind
prev story


Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
HP ArcSight ESM solution helps Finansbank
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.