AMD pins exascale vision on Fusion APUs

The rebirth of vector processing

Remote control for virtualized desktops

Because Advanced Micro Devices has not yet announced its 16-core "Interlagos" Opteron 6200 processors, it has to talk about something, and in situations like that, it is best to talk about the far-off future. And so AMD rounded up a bunch of its partners on Wednesday in San Francisco for a shindig to talk about the challenges of exascale computing.

Chuck Moore, CTO in the chip maker's Technology Group, did the talking about exascale, or the desire to create machines that can deliver more than 1,000 petaflops of number-crunching performance. Moore was one of the lead architects of the "Bulldozer" core used in the forthcoming Opteron processors, as well as for the Fusion hybrid CPU-GPU chips, which AMD calls accelerated processing units, or APUs for short.

While Moore is hot on GPUs, he said this is not something new, so much as a return to the past with a twist. "GPU computing is still in its infancy," Moore explained. "Instead of thinking of computing on a GPU, you should be thinking of this as a revival of vector computing. Going forward, we will be developing GPUs that look more like vector computers."

That got a big hallelujah from Peg Williams, senior vice president of HPC Systems at supercomputer Cray, a descendent of one of the companies founded by Seymour Cray - the legendary supercomputer maker from Control Data (and the company that bears his name) and a man who forgot more about vector processors than most experts will ever know.

The issue, said Moore, is not getting to exascale performance, but getting to exascale performance within a 20 megawatt power budget by 2020 or so.

When Moore and his colleagues were thinking about the design of the Bulldozer core, they did some math and figures that to get somewhere between 1 petaflops to 10 petaflops of performance would eat up around 10 megawatts of power, depending on the system architecture, interconnect, and the scalability of the application software running on the massive cluster.

At the midpoint of the performance range, you are talking about needing 200 megawatts to power up an exascale machine. At $1 per watt per year, which is what these supercomputer labs cost, you are talking about needing $200m a year just to turn the machine on and cool it. So clearly, scaling up high-end x86 CPUs – or any big fat RISC chip – is not the answer.

That's why Oak Ridge National Lab's "Titan" machine is a mix of CPUs and Nvidia GPUs embodied in a Cray XK6 chassis. That machine is expected to scale from 10 to 20 petaflops. At 20 petaflops, about 85 per cent of the oomph in the Titan machine will be coming from the GPUs. CPUs, which handle all the serial work to keep the GPUs fed with numbers to crunch.

The AMD plan, says Moore, is to get a 10 teraflops Fusion APU into the field that only consumes 150 watts, and to use this as the basis of an exascale machine. "You start to think that maybe we can get there," said Moore, saying that he would put a stake in the ground and predict an exascale system could be built by 2019 or 2020.

The issue is not the CPU or the GPU, but rather memory bandwidth between the two devices and between the main memory they will share. This will involve stacking memory in 3D configurations on a chip package with these CPUs and GPUs.

"That is technology that doesn't exist today, but it will be here in time," Moore predicted.

The other trick will be to have a single memory address space for the GPUs and CPUs, but perhaps using different memory technologies to create different segments of main memory that would be more suited to CPUs or GPUs, and let the system try to use those blocks whenever possible.

This idea, like all others, is not new, of course. There are probably many examples, but the one that comes to my mind is the single-level storage architecture of the System/38, AS/400 and Power Systems machines from IBM. They treated cache, main, and disk storage as a single addressable space, meaning that programmers didn’t have to worry about pointers and moving data from disk to memory and back.

It was done automagically by the operating system so RPG programmers could focus on the business logic in their programs instead of worrying about data management. This is precisely the goal that everyone has for future supercomputer applications that span multiple computing architectures.

The use of Fusion APUs in supercomputers got its start today. Penguin Computing, an AMD reseller, announced that it has sold a 59.6 teraflops system to Sandia National Labs, one of the big US Department of Energy compute facilities. The 104-node system is based on AMD's A8-3850 APU and is plunked into the Altus 2A00 rack server. And yes, it can play Crysis. ®

Beginner's guide to SSL certificates

More from The Register

next story
Fat fingered geo-block kept Aussies in the dark
NASA launches new climate model at SC14
75 days of supercomputing later ...
Yahoo! blames! MONSTER! email! OUTAGE! on! CUT! CABLE! bungle!
Weekend woe for BT as telco struggles to restore service
Cloud unicorns are extinct so DiData cloud mess was YOUR fault
Applications need to be built to handle TITSUP incidents
NSA SOURCE CODE LEAK: Information slurp tools to appear online
Now you can run your own intelligence agency
BOFH: WHERE did this 'fax-enabled' printer UPGRADE come from?
Don't worry about that cable, it's part of the config
Stop the IoT revolution! We need to figure out packet sizes first
Researchers test 802.15.4 and find we know nuh-think! about large scale sensor network ops
Trio of XSS turns attackers into admins
SanDisk vows: We'll have a 16TB SSD WHOPPER by 2016
Flash WORM has a serious use for archived photos and videos
prev story


Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Designing and building an open ITOA architecture
Learn about a new IT data taxonomy defined by the four data sources of IT visibility: wire, machine, agent, and synthetic data sets.
10 threats to successful enterprise endpoint backup
10 threats to a successful backup including issues with BYOD, slow backups and ineffective security.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.