Japanese boffins fire up 802 teraflops ceepie-geepie

Another Xeon E5 machine, with some FPGA special sauce

HP ProLiant Gen8: Integrated lifecycle automation

Upstart supercomputer maker Appro International has started up another Xeon E5-based supercomputer, this one a hybrid CPU-GPU Xtreme-X machine installed at the University of Tsukuba in Japan.

The deal at Tsukuba was announced last September at about the same time that Intel was widely expected to launch the "Sandy Bridge-EP" Xeon E5 processors, which are now due for launch sometime before the end of the first quarter. It is not clear how Appro was able to jump to the head of the Xeon E5 line, but it is worth noting that Appro now uses Intel motherboards as well as processors in its Xtreme-X clusters.

Appro Tsubuka2 supercomputer

The University of Tsukuba's "Frontier" CPU-GPU supercomputer

The Tsukuba machine is nicknamed "Frontier" but officially called the much more boring HA-PACS, short for Highly Accelerated Parallel Advanced system for Computational Sciences. The feeds and speeds of the 802 teraflops Frontier machine were a little bit vague back in September, but the university confirmed to El Reg that it is comprised of 288 server nodes using the eight-core variants of the forthcoming Xeon E5.

Each node has two sockets and 128GB of main memory, for a total of 36TB of memory on the CPU side, and each also has four of Nvidia's Tesla M2090 server-cooled, fanless GPU coprocessors, with each GPU being equipped with 6GB of its own GDDR5 graphics memory. In terms of peak theoretical capacity, only about 11 percent of the aggregate performance of the cluster comes from the Sandy Bridge processors, with the rest coming from the Tesla GPUs.

In real-world situations thus far, however, hybrid supers have had very poor efficiency because of the trickiness of the links between the CPUs and the GPUs. Because the Xeon E5s support PCI-Express 3.0 peripheral slots – and do so right on the CPU chip itself – there is every reason to believe that the doubling of bandwidth between the GPU and CPU will help all hybrid Xeon-Tesla machines get closer to their peak theoretical performance, but we won't know until the Xeon E5 machines are out and supercomputer centers start conducting their tests and publishing their results.

While efficiency is important, so is the fact that by going heavy on the GPUs, the Frontier cluster fits in 26 racks and only burns 400 kilowatts of juice. That works out to about 2,005 megaflops per watt peak. By comparison, IBM's most efficient supercomputer – the BlueGene/Q massively parallel CPU beast launched last fall at SC11 – is designed to get 20 petaflops of peak performance out of 6.6 megawatts, or about 3,030 megaflops per watt.

Without the GPUs, x86 CPU clusters can't even come close to BlueGene/Q, but with them (as was the case with the hybrid Opteron-Cell "Roadrunner" super built by IBM for Los Alamos National Lab), hybrids can get better performance per watt. And with PCI-Express 3.0, you can hang four GPUs off a two-socket server and have enough bandwidth to talk between the two compute engines.

The Frontier cluster is configured with a dual-rail Quad Data Rate (QDR) InfiniBand network comprised of ConnectX-3 network interface cards from Mellanox on the servers, and Mellanox QDR switches lashing the machines together. The nodes are configured in a fat tree configuration. The servers are linked to a little more than a half petabyte of S2A storage arrays from DataDirect Networks.

As an adjunct to the cluster, the university is working on its own direct interconnect electronics to let the GPUs in the cluster to talk directly with each other and share data without having to bother the CPU. Tsukuba techies have cooked up a little something called the PCI Express Adaptive Communication Hub, or PEACH, which acts as a PCI controller linking the GPUs to each other. Boffins are at work on an improved PEACH2 hub, which is thrust onto an FPGA, and which will make use of Nvidia's GPU-Direct protocol to create what is in effect a GPU switch.

"The largest issue in accelerated computing is how to fill the gap between its powerful internal computation performance and relatively poor external communication performance," Taisuke Boku, deputy director of the Center for Computational Sciences at the university, explained in an email to El Reg. "To overcome this problem, we need to develop various new algorithms, shifting from traditional ones for our target applications. In some applications, we may need a paradigm shift from scratch toward a new generation of algorithms. HA-PACS will be the testbed for developing of these algorithms. For this purpose, we need to use the system constantly in large scale for selected applications."

The PEACH2 GPU-PCI switches (well, that is essentially what they are) will eventually be plugged into a bunch more nodes, and these will be plugged into the Frontier machine sometime in early 2013, adding something on the order of 200 teraflops and pushing the machine up above the petaflops barrier. These lessons learned from the GPU direct connections will lay the groundwork for exascale systems, the university hopes. ®

Reducing security risks from open source software

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
Amazon Reveals One Weird Trick: A Loss On Almost $20bn In Sales
Investors really hate it: Share price plunge as growth SLOWS in key AWS division
US judge: YES, cops or feds so can slurp an ENTIRE Gmail account
Crooks don't have folders labelled 'drug records', opines NY beak
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
The triumph of VVOL: Everyone's jumping into bed with VMware
'Bandwagon'? Yes, we're on it and so what, say big dogs
Carbon tax repeal won't see data centre operators cut prices
Rackspace says electricity isn't a major cost, Equinix promises 'no levy'
prev story


Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Seven Steps to Software Security
Seven practical steps you can begin to take today to secure your applications and prevent the damages a successful cyber-attack can cause.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.