Japanese boffins fire up 802 teraflops ceepie-geepie

Another Xeon E5 machine, with some FPGA special sauce

Protecting against web application threats using SSL

Upstart supercomputer maker Appro International has started up another Xeon E5-based supercomputer, this one a hybrid CPU-GPU Xtreme-X machine installed at the University of Tsukuba in Japan.

The deal at Tsukuba was announced last September at about the same time that Intel was widely expected to launch the "Sandy Bridge-EP" Xeon E5 processors, which are now due for launch sometime before the end of the first quarter. It is not clear how Appro was able to jump to the head of the Xeon E5 line, but it is worth noting that Appro now uses Intel motherboards as well as processors in its Xtreme-X clusters.

Appro Tsubuka2 supercomputer

The University of Tsukuba's "Frontier" CPU-GPU supercomputer

The Tsukuba machine is nicknamed "Frontier" but officially called the much more boring HA-PACS, short for Highly Accelerated Parallel Advanced system for Computational Sciences. The feeds and speeds of the 802 teraflops Frontier machine were a little bit vague back in September, but the university confirmed to El Reg that it is comprised of 288 server nodes using the eight-core variants of the forthcoming Xeon E5.

Each node has two sockets and 128GB of main memory, for a total of 36TB of memory on the CPU side, and each also has four of Nvidia's Tesla M2090 server-cooled, fanless GPU coprocessors, with each GPU being equipped with 6GB of its own GDDR5 graphics memory. In terms of peak theoretical capacity, only about 11 percent of the aggregate performance of the cluster comes from the Sandy Bridge processors, with the rest coming from the Tesla GPUs.

In real-world situations thus far, however, hybrid supers have had very poor efficiency because of the trickiness of the links between the CPUs and the GPUs. Because the Xeon E5s support PCI-Express 3.0 peripheral slots – and do so right on the CPU chip itself – there is every reason to believe that the doubling of bandwidth between the GPU and CPU will help all hybrid Xeon-Tesla machines get closer to their peak theoretical performance, but we won't know until the Xeon E5 machines are out and supercomputer centers start conducting their tests and publishing their results.

While efficiency is important, so is the fact that by going heavy on the GPUs, the Frontier cluster fits in 26 racks and only burns 400 kilowatts of juice. That works out to about 2,005 megaflops per watt peak. By comparison, IBM's most efficient supercomputer – the BlueGene/Q massively parallel CPU beast launched last fall at SC11 – is designed to get 20 petaflops of peak performance out of 6.6 megawatts, or about 3,030 megaflops per watt.

Without the GPUs, x86 CPU clusters can't even come close to BlueGene/Q, but with them (as was the case with the hybrid Opteron-Cell "Roadrunner" super built by IBM for Los Alamos National Lab), hybrids can get better performance per watt. And with PCI-Express 3.0, you can hang four GPUs off a two-socket server and have enough bandwidth to talk between the two compute engines.

The Frontier cluster is configured with a dual-rail Quad Data Rate (QDR) InfiniBand network comprised of ConnectX-3 network interface cards from Mellanox on the servers, and Mellanox QDR switches lashing the machines together. The nodes are configured in a fat tree configuration. The servers are linked to a little more than a half petabyte of S2A storage arrays from DataDirect Networks.

As an adjunct to the cluster, the university is working on its own direct interconnect electronics to let the GPUs in the cluster to talk directly with each other and share data without having to bother the CPU. Tsukuba techies have cooked up a little something called the PCI Express Adaptive Communication Hub, or PEACH, which acts as a PCI controller linking the GPUs to each other. Boffins are at work on an improved PEACH2 hub, which is thrust onto an FPGA, and which will make use of Nvidia's GPU-Direct protocol to create what is in effect a GPU switch.

"The largest issue in accelerated computing is how to fill the gap between its powerful internal computation performance and relatively poor external communication performance," Taisuke Boku, deputy director of the Center for Computational Sciences at the university, explained in an email to El Reg. "To overcome this problem, we need to develop various new algorithms, shifting from traditional ones for our target applications. In some applications, we may need a paradigm shift from scratch toward a new generation of algorithms. HA-PACS will be the testbed for developing of these algorithms. For this purpose, we need to use the system constantly in large scale for selected applications."

The PEACH2 GPU-PCI switches (well, that is essentially what they are) will eventually be plugged into a bunch more nodes, and these will be plugged into the Frontier machine sometime in early 2013, adding something on the order of 200 teraflops and pushing the machine up above the petaflops barrier. These lessons learned from the GPU direct connections will lay the groundwork for exascale systems, the university hopes. ®

Choosing a cloud hosting partner with confidence

More from The Register

next story
Wanna keep your data for 1,000 YEARS? No? Hard luck, HDS wants you to anyway
Combine Blu-ray and M-DISC and you get this monster
US boffins demo 'twisted radio' mux
OAM takes wireless signals to 32 Gbps
Google+ GOING, GOING ... ? Newbie Gmailers no longer forced into mandatory ID slurp
Mountain View distances itself from lame 'network thingy'
Apple flops out 2FA for iCloud in bid to stop future nude selfie leaks
Millions of 4chan users howl with laughter as Cupertino slams stable door
Students playing with impressive racks? Yes, it's cluster comp time
The most comprehensive coverage the world has ever seen. Ever
Run little spreadsheet, run! IBM's Watson is coming to gobble you up
Big Blue's big super's big appetite for big data in big clouds for big analytics
Seagate's triple-headed Cerberus could SAVE the DISK WORLD
... and possibly bring us even more HAMR time. Yay!
prev story


Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
The next step in data security
With recent increased privacy concerns and computers becoming more powerful, the chance of hackers being able to crack smaller-sized RSA keys increases.