Feeds

Japanese boffins fire up 802 teraflops ceepie-geepie

Another Xeon E5 machine, with some FPGA special sauce

Security for virtualized datacentres

Upstart supercomputer maker Appro International has started up another Xeon E5-based supercomputer, this one a hybrid CPU-GPU Xtreme-X machine installed at the University of Tsukuba in Japan.

The deal at Tsukuba was announced last September at about the same time that Intel was widely expected to launch the "Sandy Bridge-EP" Xeon E5 processors, which are now due for launch sometime before the end of the first quarter. It is not clear how Appro was able to jump to the head of the Xeon E5 line, but it is worth noting that Appro now uses Intel motherboards as well as processors in its Xtreme-X clusters.

Appro Tsubuka2 supercomputer

The University of Tsukuba's "Frontier" CPU-GPU supercomputer

The Tsukuba machine is nicknamed "Frontier" but officially called the much more boring HA-PACS, short for Highly Accelerated Parallel Advanced system for Computational Sciences. The feeds and speeds of the 802 teraflops Frontier machine were a little bit vague back in September, but the university confirmed to El Reg that it is comprised of 288 server nodes using the eight-core variants of the forthcoming Xeon E5.

Each node has two sockets and 128GB of main memory, for a total of 36TB of memory on the CPU side, and each also has four of Nvidia's Tesla M2090 server-cooled, fanless GPU coprocessors, with each GPU being equipped with 6GB of its own GDDR5 graphics memory. In terms of peak theoretical capacity, only about 11 percent of the aggregate performance of the cluster comes from the Sandy Bridge processors, with the rest coming from the Tesla GPUs.

In real-world situations thus far, however, hybrid supers have had very poor efficiency because of the trickiness of the links between the CPUs and the GPUs. Because the Xeon E5s support PCI-Express 3.0 peripheral slots – and do so right on the CPU chip itself – there is every reason to believe that the doubling of bandwidth between the GPU and CPU will help all hybrid Xeon-Tesla machines get closer to their peak theoretical performance, but we won't know until the Xeon E5 machines are out and supercomputer centers start conducting their tests and publishing their results.

While efficiency is important, so is the fact that by going heavy on the GPUs, the Frontier cluster fits in 26 racks and only burns 400 kilowatts of juice. That works out to about 2,005 megaflops per watt peak. By comparison, IBM's most efficient supercomputer – the BlueGene/Q massively parallel CPU beast launched last fall at SC11 – is designed to get 20 petaflops of peak performance out of 6.6 megawatts, or about 3,030 megaflops per watt.

Without the GPUs, x86 CPU clusters can't even come close to BlueGene/Q, but with them (as was the case with the hybrid Opteron-Cell "Roadrunner" super built by IBM for Los Alamos National Lab), hybrids can get better performance per watt. And with PCI-Express 3.0, you can hang four GPUs off a two-socket server and have enough bandwidth to talk between the two compute engines.

The Frontier cluster is configured with a dual-rail Quad Data Rate (QDR) InfiniBand network comprised of ConnectX-3 network interface cards from Mellanox on the servers, and Mellanox QDR switches lashing the machines together. The nodes are configured in a fat tree configuration. The servers are linked to a little more than a half petabyte of S2A storage arrays from DataDirect Networks.

As an adjunct to the cluster, the university is working on its own direct interconnect electronics to let the GPUs in the cluster to talk directly with each other and share data without having to bother the CPU. Tsukuba techies have cooked up a little something called the PCI Express Adaptive Communication Hub, or PEACH, which acts as a PCI controller linking the GPUs to each other. Boffins are at work on an improved PEACH2 hub, which is thrust onto an FPGA, and which will make use of Nvidia's GPU-Direct protocol to create what is in effect a GPU switch.

"The largest issue in accelerated computing is how to fill the gap between its powerful internal computation performance and relatively poor external communication performance," Taisuke Boku, deputy director of the Center for Computational Sciences at the university, explained in an email to El Reg. "To overcome this problem, we need to develop various new algorithms, shifting from traditional ones for our target applications. In some applications, we may need a paradigm shift from scratch toward a new generation of algorithms. HA-PACS will be the testbed for developing of these algorithms. For this purpose, we need to use the system constantly in large scale for selected applications."

The PEACH2 GPU-PCI switches (well, that is essentially what they are) will eventually be plugged into a bunch more nodes, and these will be plugged into the Frontier machine sometime in early 2013, adding something on the order of 200 teraflops and pushing the machine up above the petaflops barrier. These lessons learned from the GPU direct connections will lay the groundwork for exascale systems, the university hopes. ®

Providing a secure and efficient Helpdesk

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
IBM storage revenues sink: 'We are disappointed,' says CEO
Time to put the storage biz up for sale?
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Symantec backs out of Backup Exec: Plans to can appliance in Jan
Will still provide support to existing customers
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.