Japanese boffins fire up 802 teraflops ceepie-geepie

Another Xeon E5 machine, with some FPGA special sauce

Top 5 reasons to deploy VMware with Tegile

Upstart supercomputer maker Appro International has started up another Xeon E5-based supercomputer, this one a hybrid CPU-GPU Xtreme-X machine installed at the University of Tsukuba in Japan.

The deal at Tsukuba was announced last September at about the same time that Intel was widely expected to launch the "Sandy Bridge-EP" Xeon E5 processors, which are now due for launch sometime before the end of the first quarter. It is not clear how Appro was able to jump to the head of the Xeon E5 line, but it is worth noting that Appro now uses Intel motherboards as well as processors in its Xtreme-X clusters.

Appro Tsubuka2 supercomputer

The University of Tsukuba's "Frontier" CPU-GPU supercomputer

The Tsukuba machine is nicknamed "Frontier" but officially called the much more boring HA-PACS, short for Highly Accelerated Parallel Advanced system for Computational Sciences. The feeds and speeds of the 802 teraflops Frontier machine were a little bit vague back in September, but the university confirmed to El Reg that it is comprised of 288 server nodes using the eight-core variants of the forthcoming Xeon E5.

Each node has two sockets and 128GB of main memory, for a total of 36TB of memory on the CPU side, and each also has four of Nvidia's Tesla M2090 server-cooled, fanless GPU coprocessors, with each GPU being equipped with 6GB of its own GDDR5 graphics memory. In terms of peak theoretical capacity, only about 11 percent of the aggregate performance of the cluster comes from the Sandy Bridge processors, with the rest coming from the Tesla GPUs.

In real-world situations thus far, however, hybrid supers have had very poor efficiency because of the trickiness of the links between the CPUs and the GPUs. Because the Xeon E5s support PCI-Express 3.0 peripheral slots – and do so right on the CPU chip itself – there is every reason to believe that the doubling of bandwidth between the GPU and CPU will help all hybrid Xeon-Tesla machines get closer to their peak theoretical performance, but we won't know until the Xeon E5 machines are out and supercomputer centers start conducting their tests and publishing their results.

While efficiency is important, so is the fact that by going heavy on the GPUs, the Frontier cluster fits in 26 racks and only burns 400 kilowatts of juice. That works out to about 2,005 megaflops per watt peak. By comparison, IBM's most efficient supercomputer – the BlueGene/Q massively parallel CPU beast launched last fall at SC11 – is designed to get 20 petaflops of peak performance out of 6.6 megawatts, or about 3,030 megaflops per watt.

Without the GPUs, x86 CPU clusters can't even come close to BlueGene/Q, but with them (as was the case with the hybrid Opteron-Cell "Roadrunner" super built by IBM for Los Alamos National Lab), hybrids can get better performance per watt. And with PCI-Express 3.0, you can hang four GPUs off a two-socket server and have enough bandwidth to talk between the two compute engines.

The Frontier cluster is configured with a dual-rail Quad Data Rate (QDR) InfiniBand network comprised of ConnectX-3 network interface cards from Mellanox on the servers, and Mellanox QDR switches lashing the machines together. The nodes are configured in a fat tree configuration. The servers are linked to a little more than a half petabyte of S2A storage arrays from DataDirect Networks.

As an adjunct to the cluster, the university is working on its own direct interconnect electronics to let the GPUs in the cluster to talk directly with each other and share data without having to bother the CPU. Tsukuba techies have cooked up a little something called the PCI Express Adaptive Communication Hub, or PEACH, which acts as a PCI controller linking the GPUs to each other. Boffins are at work on an improved PEACH2 hub, which is thrust onto an FPGA, and which will make use of Nvidia's GPU-Direct protocol to create what is in effect a GPU switch.

"The largest issue in accelerated computing is how to fill the gap between its powerful internal computation performance and relatively poor external communication performance," Taisuke Boku, deputy director of the Center for Computational Sciences at the university, explained in an email to El Reg. "To overcome this problem, we need to develop various new algorithms, shifting from traditional ones for our target applications. In some applications, we may need a paradigm shift from scratch toward a new generation of algorithms. HA-PACS will be the testbed for developing of these algorithms. For this purpose, we need to use the system constantly in large scale for selected applications."

The PEACH2 GPU-PCI switches (well, that is essentially what they are) will eventually be plugged into a bunch more nodes, and these will be plugged into the Frontier machine sometime in early 2013, adding something on the order of 200 teraflops and pushing the machine up above the petaflops barrier. These lessons learned from the GPU direct connections will lay the groundwork for exascale systems, the university hopes. ®

Choosing a cloud hosting partner with confidence

More from The Register

next story
Just don't blame Bono! Apple iTunes music sales PLUMMET
Cupertino revenue hit by cheapo downloads, says report
The DRUGSTORES DON'T WORK, CVS makes IT WORSE ... for Apple Pay
Goog Wallet apparently also spurned in NFC lockdown
IBM, backing away from hardware? NEVER!
Don't be so sure, so-surers
Hey - who wants 4.8 TERABYTES almost AS FAST AS MEMORY?
China's Memblaze says they've got it in PCIe. Yow
Microsoft brings the CLOUD that GOES ON FOREVER
Sky's the limit with unrestricted space in the cloud
This time it's SO REAL: Overcoming the open-source orgasm myth with TODO
If the web giants need it to work, hey, maybe it'll work
'ANYTHING BUT STABLE' Netflix suffers BIG Europe-wide outage
Friday night LIVE? Nope. The only thing streaming are tears down my face
Google roolz! Nest buys Revolv, KILLS new sales of home hub
Take my temperature, I'm feeling a little bit dizzy
Storage array giants can use Azure to evacuate their back ends
Site Recovery can help to move snapshots around
prev story


Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.
Managing SSL certificates with ease
The lack of operational efficiencies and compliance pitfalls associated with poor SSL certificate management, and how the right SSL certificate management tool can help.