Feeds

China takes HPC heavyweight title

GPUs, Arch interconnect knocks out Jaguar and Roadrunner

Top three mobile application threats

If it wasn't immediately obvious that China is a superpower, today's announcement that the Tianhe-1A CPU-GPU hybrid is the most powerful supercomputer in the world - and by a comfortable margin - will make it abundantly clear.

China wants to move from being a manufacturing powerhouse to being a full player in the 21st century technological economy, and it is making the investments to transform itself.

The National Supercomputer Center in Tianjin, China, this morning rolled out the Top 100 rankings of the country's fastest supercomputers (based on the Linpack Fortran benchmark test, like the global Top 500 list). The Tianhe-1A (which is translated from Chinese for "River in the Sky" or "Milky Way" with a model number slapped on it) beat out all of its rivals. The supercomputer is based on a rack server design created by the National University of Defense Technology (NUDT), and comprises 14,336 Xeon processors and 7,168 of Nvidia's Tesla M2050 fanless GPU co-processors.

The resulting machine has a peak theoretical performance of 4.7 petaflops, which is a gargantuan amount of raw performance, but where the rubber hits the road on the Linpack test, the machine delivers 2.51 petaflops.

That means 47 per cent of the theoretical performance of the machine is going up the chimney. This is not particularly good. But with CPU-GPU clusters costing roughly about a quarter of the cost of CPU clusters, according to Sumit Gupta, product marketing manager for the Tesla product line, on teraflops-for-teraflops basis, the inefficiency can be tolerated to make up for scalability. For now, at least.

Coders and hardware engineers the world over will now be trying to boost efficiencies on the PCI-Express bus, on the system interconnects, and in the software stack to get the sustained performance a lot closer to the peak for ceepie-geepie hybrid machines. Gupta says that the GPUs are responsible for around 70 per cent of the calculations that were done on the Linpack test.

Like the USS Enterprise, the Tianhe-1A, as the name suggests, is not the first hybrid parallel super that China has put into the field. The Tianhe-1 cluster, based on Intel Xeon chips and Advanced Micro Devices Radeon HD 4870 GPUs, broke onto the Top 500 list in November 2009. That machine had 71,680 cores and had a peak theoretical performance of 1.2 petaflops and a sustained performance of 563.1 teraflops. In that case, 53 per cent of the aggregate performance went up the chimney.

China's Tianahe-1A Supercomputer

The Tianhe-1A CPU-GPU hybrid super

The Tianhe-1A super is not important just because it is now the fastest supercomputer in the world, but because NUDT has spent years developing its own proprietary interconnect for the server nodes. And as El Reg previously reported, a future generation of Tianhe machines will use a homegrown multi-core processor, called Godson and based on the MIPS core. (So when does China's Institute of Computing Technology, part of the Chinese Academy of Sciences, start making its own GPUs?)

According to sources at Nvidia, which had people on the floor at the unveiling of Tianhe-1A in China this morning, the proprietary interconnect is called Arch and it links the server nodes together using optical-electric cables in a hybrid fat tree configuration. The switch at the heart of Arch has a bi-directional bandwidth of 160 Gb/sec, a latency for a node hop of 1.57 microseconds, and an aggregate bandwidth of more than 61 Tb/sec.

Some people have been suggesting that this interconnect somehow links the GPUs to the CPUs, but I am fairly certain that the GPUs hook to the CPUs by the plain old PCI-Express 2.0 bus in the server nodes. It would be very interesting if this interconnect has something akin to Remote Direct Memory Access, which allows a node to reach into and directly talk over the PCI-Express bus to the memory in a GPU in another node. Nvidia didn't mention this, and no one else has either, but that could significantly speed up performance if the Arch switch has such a feature.

The Tianhe-1A super has an aggregate of 262 TB of main memory and 2 PB of storage implemented as a Lustre clustered file system. The machine is comprised of 112 compute racks, eight storage node cabinets, six communications racks, and 14 I/O racks.

I personally welcome our Chinese HPC overlords. It's hard not to when my government owes their government $2 trillion, right? ®

High performance access to file storage

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Kingston DataTraveler MicroDuo: Turn your phone into a 72GB beast
USB-usiness in the front, micro-USB party in the back
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
Inside the Hekaton: SQL Server 2014's database engine deconstructed
Nadella's database sqares the circle of cheap memory vs speed
BOFH: Oh DO tell us what you think. *CLICK*
$%%&amp Oh dear, we've been cut *CLICK* Well hello *CLICK* You're breaking up...
Just what could be inside Dropbox's new 'Home For Life'?
Biz apps, messaging, photos, email, more storage – sorry, did you think there would be cake?
AMD's 'Seattle' 64-bit ARM server chips now sampling, set to launch in late 2014
But they won't appear in SeaMicro Fabric Compute Systems anytime soon
Amazon reveals its Google-killing 'R3' server instances
A mega-memory instance that never forgets
prev story

Whitepapers

Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.