Feeds

EMC: I Have A Dream - of ABBA in every HPC setup

When you need to nuke an asteroid, take a chance on us

Beginner's guide to SSL certificates

EMC has had a dream, a flash appliance called ABBA that helps mega-node HPC set-ups run faster and smoother.

ABBA is an acronym for the Active Burst Buffer Appliance. It's designed by Los Alamos National Labs (LANL) and EMC to help in massive tightly-coupled high-performance computing (HPC) where nodes need to interact and not have to restart jobs if one or more nodes fail. Apparently - I'm no HPC expert - HPC jobs involving half a million compute cores, such as a Los Alamos National Laboratories one simulating a nuclear weapon strike on an asteroid, have a series of checkpoints set up in their code with the entire memory state stored at each checkpoint in a storage node.

LANL asteroid nuking simulation

LANL simulation of nuclear weapon strike on an asteroid - Click for the vid

LANL checkpoints every four hours today onto a storage subsystem with roughly 30,000 spindles. If a node fails, it can restart the job at the preceding checkpoint, using the backup data on the storage nodes instead of having to go back to the beginning. The compute nodes have to stop their calculations when they write to the storage nodes. As the HPC job time increases and/or the number of nodes increase then the amount of wasted app compute time increases.

With half a million compute nodes it takes time to transfer masses of data, petabytes of the stuff, to the storage nodes and also restore it when nodes fail. Wouldn't it be a good thing if this could be done using flash-based storage nodes instead of disk drive-based ones? That would speed things up by getting rid of disk drive latency and also steer the set-up away from disk failures as well.

The flash storage nodes could connect to the compute nodes across higher speed links than the HDD-based nodes to make things faster still. With two flash appliances per compute node then one could receive the latest checkpoint data while the second could be writing the previous checkpoint data to disk, enabling you to have more checkpoints, reducing the delay caused by a job restart even more, and keeping the compute nodes operating more continuously.

If the HPC setup has IO nodes interconnecting the compute and storage nodes then flash-based storage nodes could be the IO nodes as well, simplifying the overall design.

ABBA is this flash storage node and it is intended for fast big data situations, like the LANL ones. If HPC continues its massively parallel growth, then theoretically we are heading towards a billion cores and exaflop computing. EMC scientists Sorin Faibish and John Bent in the Fast Data Group in EMC's Office of the CTO are working on ABBA with others. They have produced a detailed slide deck (pdf).

In it they make the point that ABBA is file system-agnostic and suggest using ABBA could improve compute efficiency by 40 percent, where compute efficiency is the app compute time divided by the sum of the app compute tine and checkpoint time.

ABBA nodes could also provide co-processing analysis and visualisation of the HPC jobs. Virtual Geek has more information, including some videos:-

- Chad's World LIVE II from EMC World 2012. Fast forward to 44.30 to get to the ABBA part. This is a really cool video. Who is the dancing queen?
- Simplifying HPC Architectures.
- Demystifying Fast Data which looks at the Los Alamos challenge.

There is also a Big Ideas video mentioned.

Ah, EMC and flash; the winner takes it all. ®

Remote control for virtualized desktops

More from The Register

next story
NSA SOURCE CODE LEAK: Information slurp tools to appear online
Now you can run your own intelligence agency
Azure TITSUP caused by INFINITE LOOP
Fat fingered geo-block kept Aussies in the dark
Yahoo! blames! MONSTER! email! OUTAGE! on! CUT! CABLE! bungle!
Weekend woe for BT as telco struggles to restore service
Cloud unicorns are extinct so DiData cloud mess was YOUR fault
Applications need to be built to handle TITSUP incidents
BOFH: WHERE did this 'fax-enabled' printer UPGRADE come from?
Don't worry about that cable, it's part of the config
Stop the IoT revolution! We need to figure out packet sizes first
Researchers test 802.15.4 and find we know nuh-think! about large scale sensor network ops
SanDisk vows: We'll have a 16TB SSD WHOPPER by 2016
Flash WORM has a serious use for archived photos and videos
Astro-boffins start opening universe simulation data
Got a supercomputer? Want to simulate a universe? Here you go
Microsoft adds video offering to Office 365. Oh NOES, you'll need Adobe Flash
Lovely presentations... but not on your Flash-hating mobe
prev story

Whitepapers

Driving business with continuous operational intelligence
Introducing an innovative approach offered by ExtraHop for producing continuous operational intelligence.
Why CIOs should rethink endpoint data protection in the age of mobility
Assessing trends in data protection, specifically with respect to mobile devices, BYOD, and remote employees.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Protecting against web application threats using SSL
SSL encryption can protect server‐to‐server communications, client devices, cloud resources, and other endpoints in order to help prevent the risk of data loss and losing customer trust.