Feeds

China takes HPC heavyweight title

GPUs, Arch interconnect knocks out Jaguar and Roadrunner

HP ProLiant Gen8: Integrated lifecycle automation

If it wasn't immediately obvious that China is a superpower, today's announcement that the Tianhe-1A CPU-GPU hybrid is the most powerful supercomputer in the world - and by a comfortable margin - will make it abundantly clear.

China wants to move from being a manufacturing powerhouse to being a full player in the 21st century technological economy, and it is making the investments to transform itself.

The National Supercomputer Center in Tianjin, China, this morning rolled out the Top 100 rankings of the country's fastest supercomputers (based on the Linpack Fortran benchmark test, like the global Top 500 list). The Tianhe-1A (which is translated from Chinese for "River in the Sky" or "Milky Way" with a model number slapped on it) beat out all of its rivals. The supercomputer is based on a rack server design created by the National University of Defense Technology (NUDT), and comprises 14,336 Xeon processors and 7,168 of Nvidia's Tesla M2050 fanless GPU co-processors.

The resulting machine has a peak theoretical performance of 4.7 petaflops, which is a gargantuan amount of raw performance, but where the rubber hits the road on the Linpack test, the machine delivers 2.51 petaflops.

That means 47 per cent of the theoretical performance of the machine is going up the chimney. This is not particularly good. But with CPU-GPU clusters costing roughly about a quarter of the cost of CPU clusters, according to Sumit Gupta, product marketing manager for the Tesla product line, on teraflops-for-teraflops basis, the inefficiency can be tolerated to make up for scalability. For now, at least.

Coders and hardware engineers the world over will now be trying to boost efficiencies on the PCI-Express bus, on the system interconnects, and in the software stack to get the sustained performance a lot closer to the peak for ceepie-geepie hybrid machines. Gupta says that the GPUs are responsible for around 70 per cent of the calculations that were done on the Linpack test.

Like the USS Enterprise, the Tianhe-1A, as the name suggests, is not the first hybrid parallel super that China has put into the field. The Tianhe-1 cluster, based on Intel Xeon chips and Advanced Micro Devices Radeon HD 4870 GPUs, broke onto the Top 500 list in November 2009. That machine had 71,680 cores and had a peak theoretical performance of 1.2 petaflops and a sustained performance of 563.1 teraflops. In that case, 53 per cent of the aggregate performance went up the chimney.

China's Tianahe-1A Supercomputer

The Tianhe-1A CPU-GPU hybrid super

The Tianhe-1A super is not important just because it is now the fastest supercomputer in the world, but because NUDT has spent years developing its own proprietary interconnect for the server nodes. And as El Reg previously reported, a future generation of Tianhe machines will use a homegrown multi-core processor, called Godson and based on the MIPS core. (So when does China's Institute of Computing Technology, part of the Chinese Academy of Sciences, start making its own GPUs?)

According to sources at Nvidia, which had people on the floor at the unveiling of Tianhe-1A in China this morning, the proprietary interconnect is called Arch and it links the server nodes together using optical-electric cables in a hybrid fat tree configuration. The switch at the heart of Arch has a bi-directional bandwidth of 160 Gb/sec, a latency for a node hop of 1.57 microseconds, and an aggregate bandwidth of more than 61 Tb/sec.

Some people have been suggesting that this interconnect somehow links the GPUs to the CPUs, but I am fairly certain that the GPUs hook to the CPUs by the plain old PCI-Express 2.0 bus in the server nodes. It would be very interesting if this interconnect has something akin to Remote Direct Memory Access, which allows a node to reach into and directly talk over the PCI-Express bus to the memory in a GPU in another node. Nvidia didn't mention this, and no one else has either, but that could significantly speed up performance if the Arch switch has such a feature.

The Tianhe-1A super has an aggregate of 262 TB of main memory and 2 PB of storage implemented as a Lustre clustered file system. The machine is comprised of 112 compute racks, eight storage node cabinets, six communications racks, and 14 I/O racks.

I personally welcome our Chinese HPC overlords. It's hard not to when my government owes their government $2 trillion, right? ®

Reducing security risks from open source software

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
Amazon Reveals One Weird Trick: A Loss On Almost $20bn In Sales
Investors really hate it: Share price plunge as growth SLOWS in key AWS division
US judge: YES, cops or feds so can slurp an ENTIRE Gmail account
Crooks don't have folders labelled 'drug records', opines NY beak
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
The triumph of VVOL: Everyone's jumping into bed with VMware
'Bandwagon'? Yes, we're on it and so what, say big dogs
Carbon tax repeal won't see data centre operators cut prices
Rackspace says electricity isn't a major cost, Equinix promises 'no levy'
prev story

Whitepapers

Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Seven Steps to Software Security
Seven practical steps you can begin to take today to secure your applications and prevent the damages a successful cyber-attack can cause.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.