Feeds

Calm a CARMA drama chameleon: Barça super waves ARMs, GPUs

Cluster perks up with low-power CPUs bossing graphics chips

Combat fraud and increase customer satisfaction

GTC 2013 Over the last few years, we’ve seen a steadily growing buzz surrounding the use of ARM processors in PCs, servers and supercomputers.

Here, at this year's GPU Technology Conference in California, that buzz is even more pronounced. This is due to Nvidia's upcoming 64-bit "Project Denver" ARM cores, and advances in its graphics chips to make machines even less dependent on a fast and powerful (read: Intel Xeon) processor feeding data to number-crunching GPU beasts. El Reg's Rik Myslewski penned a great article on GTC 2013 ARM chatter here.

While everyone has been debating and speculating about what it would be like to combine Brit-designed ARM cores and GPU accelerators, one organisation has put together some hardware in order to separate the theoretical from the real. The Barcelona Supercomputer Centre (that's Barcelona in Spain, not the other one) is building clusters to explore the potential advantages that might arise from combining nippy ARM-compatible chips with fast number-crunching GPUs.

The centre's first attempt, the Tibadabo, was a proof-of-concept system to determine whether it’s possible to build an all-ARM-based cluster. Could they really put together a cluster based on a low-power processor family that's ideally suited to mobile phones, hard drives and handheld games? And, if they could build it, could they find or adapt enough software for it to do useful work?

They were able to construct a two-rack cluster containing 32 blades, 256 nodes, and a total of 512 Tegra 2 ARM cores. They were able to port 11 scientific apps over to ARM's architecture with little difficulty, although they did need to fiddle around with the memory hierarchy to optimize some of the apps.

The performance wasn’t all that great. The total system turned out 512 billion floating-point calculations a second (512GFLOPS) while consuming 3.4kW, yielding 0.15GFLOPs/watt. For context, the best systems on the most recent Green-500 list - the top 500 supercomputers ranked by energy efficiency - come in around 2.4 or 2.5GFLOPs/watt; the systems at the end of the list are rated at 0.033GFLOPs/watt.

What's the world CARMA to?

So the Spanish brainiacs went back to the drawing board and clustered 16 Nvidia CARMA* development boxes as a learning experience they called Pedraforca v1. This system did much better than the ARM-only Tibadabo on energy efficiency, yielding .78GFLOPs/watts while running the DGEMM matrix-multiplication benchmark, so they were making progress.

Limitations in the platform (such as the max speed of 400MB/s over the PCIe bus plus an inability to overlap computation and data transfers) meant it couldn’t scaled up very well. However, it did lead them to a new breakthrough in their thinking for their next system, which they’ve dubbed Pedraforca v2.

They’ve decided the key to building a highly efficient system isn’t to erect an accelerated cluster but to build a cluster of accelerators. While there isn’t much difference in the words, there’s a world of difference between the meanings. For Pedraforca v2, they will decouple the CPUs from the GPUs, meaning that the ratio of general-purpose cores to graphics processor cores can be changed to fit the workloads. They will also use direct GPU-GPU data transfers via Mellanox’s ConnectX-3 Infiniband interconnects.

This will take a huge amount of latency out of the system and, accordingly, reduce the amount of work the CPU needs to do to orchestrate GPU communications. The prototype system will have 64 nodes, each sporting a quad-core Tegra 3 CPU at 1.3GHz that will slide into a 4x PCIe slot on a Mini-ITX carrier. In this configuration, the CPU will only be managing boot and MPI communications, plus minimal traffic cop duty for the GPUs. The point is that you don’t need a hugely fast and powerful processor to fulfill these requirements.

However, Pedraforca v2 will have some processing power in the form of Kepler-based Nvidia K20 GPUs that can deliver 1,170GFLOPS through a PCIe Gen 3 slot. The GPUs will be able to communicate with each other at 40Gbps via the aforementioned Mellanox-fuelled Infiniband interconnect.

The chaps presenting this tech at GTC 2013 pointed out that this isn’t a general-purpose HPC system – it is intended as a host for apps that are GPU-optimised. While they didn’t discuss any FLOPS/watt estimates or performance predictions, it’s safe to say that Pedraforca v2 should be an eye opener when it comes to energy efficiency and even cost per FLOP. It’s definitely a project worth watching. ®

* CUDA on ARM architecture.

3 Big data security analytics techniques

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Kingston DataTraveler MicroDuo: Turn your phone into a 72GB beast
USB-usiness in the front, micro-USB party in the back
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
BOFH: Oh DO tell us what you think. *CLICK*
$%%&amp Oh dear, we've been cut *CLICK* Well hello *CLICK* You're breaking up...
AMD's 'Seattle' 64-bit ARM server chips now sampling, set to launch in late 2014
But they won't appear in SeaMicro Fabric Compute Systems anytime soon
Cisco reps flog Whiptail's Invicta arrays against EMC and Pure
Storage reseller report reveals who's selling what
prev story

Whitepapers

SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.