Green supercomputer benchmarks make boffins see red, check blueprints

Top 10 use GPU brawn, but has anyone any bright ideas on better juicy tests?

All well and good, but...

During a Wednesday session at SC13 discussing the Green500, all participants agreed – including representatives from Green500 itself – that testing, scoring, and ranking systems based on their power consumption an inexact science in need of repair.

Although Green500 does publish rules governing the running of energy-measurement tests for the submission of scores for ranking, and has collaborated with the Energy Efficient High Performance Computing Working Group (EE HPC WG) on a three-level methodology [PDF] for testing, all the HPCers involved freely admit that much more work needs to be done to clarify the testing and reporting procedures.

Of those three levels of testing, Level 1 is the simplest, and is always required for Green500 entry submission. Its simplicity, however, is almost an understatement: in Level 1 testing, the only subsystem under test is the compute system – forget about storage and networking. They don't appear until Level 2, in which "all subsystems participating in the workload must be measured or estimated."

Leaving aside the question of the difficulty of accurately and discretely measuring power on any subsystems at any level, the temptation to keep the testing as simplistic as reasonably possible lies in the fact that the higher the testing level, the lower the Mflops/W rating. Usually. Mostly. Probably. Often. Sometimes. Maybe.

During the session, Thomas Schulthess of CSCS, Piz Daint's home, was adamant that Level 3 – which takes the Level 2 rules and makes them more stringent – is the only truly legitimate way of measuring a system's power consumption, although Level 2 is acceptable, as well. Level 1 is his bête noire.

"I am a physicist by training," he said, "a professor of physics at ETH, and I have to measure that one number – that's the true number. There are no two different numbers or two different efficiencies in systems."

When Schulthess ran Level 1 testing on Piz Daint, he obtained one number; when he ran Level 3 testing, the score was more accurate but less impressive. The difference was more than enough to make the "true number"-seeking physicist in him uncomfortable, seeing as how the Green500 accepts Level 1–tested submissions.

Running Linpack on Piz Daint and using Level 3–class analysis, Schulthess and his team came up with a score of 3,186 Mflops/W. Running under Level 1 rules, that score jumped to 3,864Mflops/W.

"People said we should have submitted this" higher Level 1 number to Green500, he said, "but it is wrong. It is the wrong number." So Schulthess submitted a Level 3 score even though his competition was allowed to submit Level 1 scores. Piz Daint ended up at number four instead of number two.

"Every center, every system owner and system operator, is responsible to publish the right number," he said. "It's just the way that things are done in science, and I hope the supercomputing community adheres to the same rules. I'm not sure, from the numbers I've seen, whether this is always the case."

It must be quickly noted that your Reg reporter could detect no rancor among the participants in the SC13 discussion on how to best and honestly measure energy efficiency of HPC system. Instead, the participants evidenced a sincere desire to get it right.

Green500 and interested members of the HPC community are going to be hammering away at this problem in the coming months. If you're interested in joining the discussion, you can contact them on their website.

Nobody – nobody in their right mind, that is – ever said that benchmark development was easy. ®

Biting the hand that feeds IT © 1998–2017