Feeds

Barcelona taps Tesla and Tegra for next-gen hybrid supercomputer

Singing the praises of ceepie-geepies for PRACE

Top three mobile application threats

The Barcelona Supercomputing Center has been monkeying around with the combination of low-powered processors and relatively low-end graphics chips. Well, BSC is getting ready to take its ceepie-geepie prototyping up another notch by marrying a baby ARM processor aimed at smartphones and tablets with a full-on GPU coprocessor.

BSC is the Church of the Ceepie-Geepie, quite literally

BSC is the Church of the Ceepie-Geepie, quite literally

The prototype cluster, which is to be called Pedraforca, will take the existing Tegra 3 processor from Nvidia - as implemented on the "Kayla" system launched by the chipmaker in conjunction with motherboard maker SECO back in March at the GPU Technical Conference.

The predecessor Carma system from Nvidia (short for CUDA-ARM), also made with SECO, put a four-core Tegra 3 chip based on the Cortex-A9 processor running at 1.5GHz on a mobo and linked it to a GeForce GT520MX mobile GPU coprocessor with 49 cores running at 900MHz and delivering 142 gigaflops of floating point processing at double precision.

With the Kayla system board, which comes in a MiniITX form factor, the CPU side is again a Tegra 3 chip from Nvidia, but the board has a PCI Express 2.0 x16 link to hook a full-on Tesla GPU to the ARM processor.

BSC collaborated with Nvidia and SECO to create the Kayla board, and built a prototype machine – its second ARM-GPU hybrid – on that card last fall. Tesla coprocessors based on Nvidia's GF108, GK104, and GK107 graphics processors are supported with the Kayla system.

It would be nice to have a faster PCI-Express link to hook the CPU and GPU together, and having only 2GB of memory for four cores might be a little skinny, too. A gigabit Ethernet link is not going to break any performance barriers, either. But at €349 per Kayla system, experimenting and seeing how software could run on such a ceepie-geepie is not exactly going to bust the budget, not even for a dense-packed rack of these little beasties.

The Tegra 3 ARM board and Tesla K20 GPU powering Pedraforca

With the Pedraforca system, the third generation of ARM-based ceepie-geepies to be prototyped by BSC, the supercomputer center will again use a board that has a Tegra 3 processor and Sumit Gupta, general manager of the Tesla Accelerated Computing business unit at Nvidia, says that the nodes will use a Tesla K20 to the ARM CPU's math homework.

The ARM processor nodes will be linked to each other using 40Gb/sec InfiniBand adapters and switches from Mellanox Technologies, giving it a substantial performance boost. This is not just because of the increase in bandwidth, but thanks to Remote Direct Access Memory (RDMA) in the InfiniBand protocol, which will allow the CPUs to talk to each other over the network without having to go through the network software stack in the Linux operating system on the cluster. And, thanks to the much-improved GPUDirect feature in the "Kepler" GPUs from Nvidia, the GPUs can talk over InfiniBand to each other without speaking to the CPU, too.

This, as it turns out, is important, as are the Hyper-Q and Dynamic Parallelism features of the high-end Kepler GPUs from Nvidia.

"Fermi-class GPUs were too limited and we had to rely too much on the Tegra CPU with them," explains Alex Ramirez, leader of the Heterogeneous Architectures Research Group at BSC, to El Reg. But with GPUDirect combined with InfiniBand and Hyper-Q (which allows the ARM CPU to queue up 32 MPI tasks at the same time on the Kepler GPU instead of one MPI task that was allowed on a Fermi GPU) and Dynamic Parallelism (which lets the GPU schedule its own work without asking the CPU for permission), now BSC can start testing software in earnest on a ceepie-geepie.

For workloads that don't bother the CPU much, Ramirez says that the Pedraforca cluster should get just about the same performance as a Xeon cluster that is offloading most of its work to the GPU, but without all that Xeon heat and cost.

You might be wondering how a Kayla system from SECO is able to have both a Tesla K20 GPU coprocessor and an InfiniBand ConnectX-2 adapter card both plugged into them, since there is only one x16 slot. Ramirez says that BSC is getting a PCI-Express switch from PLX Technologies and putting it into that single slot and then plugging in the InfiniBand adapters and Tesla GPUs into the switch. It looks like there will be multiple PLX switches in the cluster, which Ramirez hopes to scale up to 128 nodes when it is installed in July.

The Pedraforca machine is partially funded by the Partnership for Advanced Computing in Europe (PRACE) initiative. The compute nodes will be manufactured by E4 Computer Engineering and Bull is being hired to do the system integration. ®

High performance access to file storage

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Kingston DataTraveler MicroDuo: Turn your phone into a 72GB beast
USB-usiness in the front, micro-USB party in the back
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
Inside the Hekaton: SQL Server 2014's database engine deconstructed
Nadella's database sqares the circle of cheap memory vs speed
BOFH: Oh DO tell us what you think. *CLICK*
$%%&amp Oh dear, we've been cut *CLICK* Well hello *CLICK* You're breaking up...
Just what could be inside Dropbox's new 'Home For Life'?
Biz apps, messaging, photos, email, more storage – sorry, did you think there would be cake?
AMD's 'Seattle' 64-bit ARM server chips now sampling, set to launch in late 2014
But they won't appear in SeaMicro Fabric Compute Systems anytime soon
Amazon reveals its Google-killing 'R3' server instances
A mega-memory instance that never forgets
prev story

Whitepapers

Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.