Feeds

China takes HPC heavyweight title

GPUs, Arch interconnect knocks out Jaguar and Roadrunner

Beginner's guide to SSL certificates

If it wasn't immediately obvious that China is a superpower, today's announcement that the Tianhe-1A CPU-GPU hybrid is the most powerful supercomputer in the world - and by a comfortable margin - will make it abundantly clear.

China wants to move from being a manufacturing powerhouse to being a full player in the 21st century technological economy, and it is making the investments to transform itself.

The National Supercomputer Center in Tianjin, China, this morning rolled out the Top 100 rankings of the country's fastest supercomputers (based on the Linpack Fortran benchmark test, like the global Top 500 list). The Tianhe-1A (which is translated from Chinese for "River in the Sky" or "Milky Way" with a model number slapped on it) beat out all of its rivals. The supercomputer is based on a rack server design created by the National University of Defense Technology (NUDT), and comprises 14,336 Xeon processors and 7,168 of Nvidia's Tesla M2050 fanless GPU co-processors.

The resulting machine has a peak theoretical performance of 4.7 petaflops, which is a gargantuan amount of raw performance, but where the rubber hits the road on the Linpack test, the machine delivers 2.51 petaflops.

That means 47 per cent of the theoretical performance of the machine is going up the chimney. This is not particularly good. But with CPU-GPU clusters costing roughly about a quarter of the cost of CPU clusters, according to Sumit Gupta, product marketing manager for the Tesla product line, on teraflops-for-teraflops basis, the inefficiency can be tolerated to make up for scalability. For now, at least.

Coders and hardware engineers the world over will now be trying to boost efficiencies on the PCI-Express bus, on the system interconnects, and in the software stack to get the sustained performance a lot closer to the peak for ceepie-geepie hybrid machines. Gupta says that the GPUs are responsible for around 70 per cent of the calculations that were done on the Linpack test.

Like the USS Enterprise, the Tianhe-1A, as the name suggests, is not the first hybrid parallel super that China has put into the field. The Tianhe-1 cluster, based on Intel Xeon chips and Advanced Micro Devices Radeon HD 4870 GPUs, broke onto the Top 500 list in November 2009. That machine had 71,680 cores and had a peak theoretical performance of 1.2 petaflops and a sustained performance of 563.1 teraflops. In that case, 53 per cent of the aggregate performance went up the chimney.

China's Tianahe-1A Supercomputer

The Tianhe-1A CPU-GPU hybrid super

The Tianhe-1A super is not important just because it is now the fastest supercomputer in the world, but because NUDT has spent years developing its own proprietary interconnect for the server nodes. And as El Reg previously reported, a future generation of Tianhe machines will use a homegrown multi-core processor, called Godson and based on the MIPS core. (So when does China's Institute of Computing Technology, part of the Chinese Academy of Sciences, start making its own GPUs?)

According to sources at Nvidia, which had people on the floor at the unveiling of Tianhe-1A in China this morning, the proprietary interconnect is called Arch and it links the server nodes together using optical-electric cables in a hybrid fat tree configuration. The switch at the heart of Arch has a bi-directional bandwidth of 160 Gb/sec, a latency for a node hop of 1.57 microseconds, and an aggregate bandwidth of more than 61 Tb/sec.

Some people have been suggesting that this interconnect somehow links the GPUs to the CPUs, but I am fairly certain that the GPUs hook to the CPUs by the plain old PCI-Express 2.0 bus in the server nodes. It would be very interesting if this interconnect has something akin to Remote Direct Memory Access, which allows a node to reach into and directly talk over the PCI-Express bus to the memory in a GPU in another node. Nvidia didn't mention this, and no one else has either, but that could significantly speed up performance if the Arch switch has such a feature.

The Tianhe-1A super has an aggregate of 262 TB of main memory and 2 PB of storage implemented as a Lustre clustered file system. The machine is comprised of 112 compute racks, eight storage node cabinets, six communications racks, and 14 I/O racks.

I personally welcome our Chinese HPC overlords. It's hard not to when my government owes their government $2 trillion, right? ®

Security for virtualized datacentres

More from The Register

next story
It's Big, it's Blue... it's simply FABLESS! IBM's chip-free future
Or why the reversal of globalisation ain't gonna 'appen
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
CAGE MATCH: Microsoft, Dell open co-located bit barns in Oz
Whole new species of XaaS spawning in the antipodes
Microsoft and Dell’s cloud in a box: Instant Azure for the data centre
A less painful way to run Microsoft’s private cloud
AWS pulls desktop-as-a-service from the PC
Support for PCoIP protocol means zero clients can run cloudy desktops
prev story

Whitepapers

Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.