Feeds

DOE doles out cash to AMD, Whamcloud for exascale research

Pushing compute, memory, and I/O to the limits

  • alert
  • submit to reddit

Security for virtualized datacentres

The burden of memory

On the memory front, DRAM failure rates are higher than expected and density improvements in memory chips are not coming fast enough. So DOE wants researchers to explore the use of in-memory processing – literally putting tiny compute elements in the memory to do vector math or scatter/gather operations – as well as the integration of various forms of non-volatile storage into exascale systems.

The nuke labs are thinking that 500GB of NVRAM Of some sort per socket will do the trick. While 4TB/sec of bandwidth is a baseline, DOE really wants 10TB/sec.

US DOE NNSA logo

Parallel storage subsystems generally hold up better than compute nodes on exascale systems these days, with the DOE estimating that the meantime between application failure due to a storage issue being around 20 days. Without any substantial changes to storage architectures, that will drop to 14 days by 2020. Disk capacity is increasing at a decent clip, but disk performance is not. Solid state drives are fast, but they ain't cheap.

If availability is not as big of an issue for exabyte-class storage, then scale surely is. That exascale system in 2020 will have between 100,000 and 1 million nodes, and will have somewhere between 100 million and 1 billion computing elements, with somewhere between 30PB and 60PB of memory, and across which some sort of concurrency will have to be provided to run applications.

This behemoth will require from 600PB to 3,000PB of disk capacity. In effect, the disk array for an exascale compute farm will be an exascale system in its own right, with peak I/O burst rates on the order of 200TB/sec and metadata transaction rates on the order of 100MB/sec.

For the FastForward storage research projects, DOE wants a storage system that can keep the fully running exascale system fed, without crashing, for 30 days or more, and the mean time between unrecoverable data loss should be 120 days or higher – and do so with the storage array crammed to 80 per cent of capacity and performing full memory dumps from the system every hour.

Data integrity algorithms for storage can impose no more than 10 per cent overhead on the metadata servers at the heart of the storage array. Metadata insert rates are expected to be on the order of 1 million to 100 million per second, and lookup and retrievals are expected to be on the order of 100,000 to 10 million per second out of the metadata servers.

During peak system writing and reading operations, the metadata servers can't take any more than a 25 per cent performance degradation hit, and DOE would really like it to be 10 per cent.

No big deal, right?

So, good luck, AMD, Whamcloud, and friends.

The winners

AMD received research grants under the FastForward portion of the DOE Extreme-Scale Computing program for both processing and memory research, and according to Alan Lee, corporate vice president for advanced research and development at the chip maker, the reason is because the two are interrelated.

Lee was not able to elaborate much on the research plans AMD has put together, but he did confirm to El Reg that AMD would be focusing on research to push its hybrid CPU-GPU processors, what the company calls its Accelerated Processing Units or APUs. On the memory side, AMD is looking a different types of memory, different structures and hierarchies of memory, and different relationships between these memories and the APUs, and that this will, of course, necessarily involve system interconnect work.

"Moving data around to feed the beast is critical for exascale," explained Lee, adding that the SeaMicro acquisition earlier this year was not done for this DOE work, but the interconnect expertise that AMD gained through that acquisition would be put to good use.

AMD researchers have already identified a subset of key memory technologies that they think will be applicable to exascale-class systems, and this is what the research will focus on. AMD is not throwing the whole kitchen sink of possible volatile and non-volatile memories into the mix.

Lee was not at liberty to say what memory technologies AMD was looking at – that would be helping its inevitable competition. AMD has received a grant of $3m for the memory research and $12.6m for the processor research. It is interesting that AMD was able to bag these contracts all by its lonesome specifically after the DOE said that it wanted multiple companies cooperating on the work.

On the storage front, Whamcloud, the company that was formed in July 2010 to support and extend the open source Lustre file system, is the leading contractor and is soliciting help from a bunch of others.

Whamcloud is managing the project and lending its Lustre file system expertise and is relying on HDF Group for application I/O expertise, EMC for system I/O and I/O aggregation skills, and Cray for scale-out testing of the storage systems. This exascale storage system will have a mix of flash and disk drives.

The word on the street is that Whamcloud received around $8m for its FastForward grant. ®

Providing a secure and efficient Helpdesk

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
IBM storage revenues sink: 'We are disappointed,' says CEO
Time to put the storage biz up for sale?
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Symantec backs out of Backup Exec: Plans to can appliance in Jan
Will still provide support to existing customers
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.