Old fashioned engineering: HPC cluster kids would like to thank their fans. No really

Gutsy overclockers set new record

Fan

HPC Blog Excitement reigned at the Asia Student Supercomputer Challenge Cluster Competition as Zhejiang University set a new student LINPACK record with 12.03 Tflop/s.

With this result, they eclipsed the previous record of 10.78 Tflop/s set by Jamia Millia Islamia University (JMI) at last year’s ISC15 competition.

Here's a historical chart of past student cluster competition LINPACK records...

" height=100 width=100>

Setting a new high water mark in student clustering is great, but what’s really interesting is how they did it.

For the last couple of years, student cluster competition LINPACK records have been set by four node systems with eight NVIDIA K80 accelerators, plus sophisticated liquid cooling. This has been the case ever since University of Edinburgh was the first school to top the 10 Tflop record with their 10.10 mark at ISC14. JMI came along the next year, with essentially the same hardware configuration, and pushed the record up a bit more to 10.78 Tflops.

Guts = Glory

The kids from Zhejiang went about it a different way: they used old fashioned engineering plus dare-devil gutsiness to construct their system and make their run at LINPACK glory. First they figured out that their fans were consuming five to 10 per cent of the total electrical load on their four node, eight K80 cluster.

With that in mind, they looked at ways to minimise fan power. They determined that their existing rack structure wasn’t utilising airflow as well as it could, so they rearranged their rack to make it more windy.

They then removed some fans, which is not for the faint of heart, and installed software to let them dynamically control the speed of the remaining fans. This gave them more watts to dedicate for computation. The final piece of the puzzle was some judicious overclocking of their K80 video cards.

The end result was a LINPACK score of 12.03 Tflops, which handily beat last year’s ISC15 record of 10.78.

I really have to take my hat off to these kids. Other teams have tried to reduce power consumption by using liquid cooling and other means, but no team has ever had the guts to try overclocking their CPUs or GPUs.

I can kind of see why they wouldn’t try to overclock their CPUs, most motherboards and BIOSes aren’t overclocking-friendly when it comes to processors. However, overclocking GPUs is done all the time in the enthusiast world and can be carried out relatively risk-free. Like most computing components, if GPUs get too hot, they’ll shut down automatically.

Great job, Team Zhejiang, maybe you’ll inspire future student cluster warriors to take more risk. Enjoy the 10,000 RMB ($1,500, £1,000) award! ®

Sponsored: Becoming a Pragmatic Security Leader




Biting the hand that feeds IT © 1998–2019