The burden of memory
On the memory front, DRAM failure rates are higher than expected and density improvements in memory chips are not coming fast enough. So DOE wants researchers to explore the use of in-memory processing – literally putting tiny compute elements in the memory to do vector math or scatter/gather operations – as well as the integration of various forms of non-volatile storage into exascale systems.
The nuke labs are thinking that 500GB of NVRAM Of some sort per socket will do the trick. While 4TB/sec of bandwidth is a baseline, DOE really wants 10TB/sec.
Parallel storage subsystems generally hold up better than compute nodes on exascale systems these days, with the DOE estimating that the meantime between application failure due to a storage issue being around 20 days. Without any substantial changes to storage architectures, that will drop to 14 days by 2020. Disk capacity is increasing at a decent clip, but disk performance is not. Solid state drives are fast, but they ain't cheap.
If availability is not as big of an issue for exabyte-class storage, then scale surely is. That exascale system in 2020 will have between 100,000 and 1 million nodes, and will have somewhere between 100 million and 1 billion computing elements, with somewhere between 30PB and 60PB of memory, and across which some sort of concurrency will have to be provided to run applications.
This behemoth will require from 600PB to 3,000PB of disk capacity. In effect, the disk array for an exascale compute farm will be an exascale system in its own right, with peak I/O burst rates on the order of 200TB/sec and metadata transaction rates on the order of 100MB/sec.
For the FastForward storage research projects, DOE wants a storage system that can keep the fully running exascale system fed, without crashing, for 30 days or more, and the mean time between unrecoverable data loss should be 120 days or higher – and do so with the storage array crammed to 80 per cent of capacity and performing full memory dumps from the system every hour.
Data integrity algorithms for storage can impose no more than 10 per cent overhead on the metadata servers at the heart of the storage array. Metadata insert rates are expected to be on the order of 1 million to 100 million per second, and lookup and retrievals are expected to be on the order of 100,000 to 10 million per second out of the metadata servers.
During peak system writing and reading operations, the metadata servers can't take any more than a 25 per cent performance degradation hit, and DOE would really like it to be 10 per cent.
No big deal, right?
So, good luck, AMD, Whamcloud, and friends.
AMD received research grants under the FastForward portion of the DOE Extreme-Scale Computing program for both processing and memory research, and according to Alan Lee, corporate vice president for advanced research and development at the chip maker, the reason is because the two are interrelated.
Lee was not able to elaborate much on the research plans AMD has put together, but he did confirm to El Reg that AMD would be focusing on research to push its hybrid CPU-GPU processors, what the company calls its Accelerated Processing Units or APUs. On the memory side, AMD is looking a different types of memory, different structures and hierarchies of memory, and different relationships between these memories and the APUs, and that this will, of course, necessarily involve system interconnect work.
"Moving data around to feed the beast is critical for exascale," explained Lee, adding that the SeaMicro acquisition earlier this year was not done for this DOE work, but the interconnect expertise that AMD gained through that acquisition would be put to good use.
AMD researchers have already identified a subset of key memory technologies that they think will be applicable to exascale-class systems, and this is what the research will focus on. AMD is not throwing the whole kitchen sink of possible volatile and non-volatile memories into the mix.
Lee was not at liberty to say what memory technologies AMD was looking at – that would be helping its inevitable competition. AMD has received a grant of $3m for the memory research and $12.6m for the processor research. It is interesting that AMD was able to bag these contracts all by its lonesome specifically after the DOE said that it wanted multiple companies cooperating on the work.
On the storage front, Whamcloud, the company that was formed in July 2010 to support and extend the open source Lustre file system, is the leading contractor and is soliciting help from a bunch of others.
Whamcloud is managing the project and lending its Lustre file system expertise and is relying on HDF Group for application I/O expertise, EMC for system I/O and I/O aggregation skills, and Cray for scale-out testing of the storage systems. This exascale storage system will have a mix of flash and disk drives.
The word on the street is that Whamcloud received around $8m for its FastForward grant. ®
and the answer is....
I'm all for appropriate research
I am concerned however that the DOE is forking out a lot of money with little clue. The DOE hasn't had any reasonable or legitimate energy policy in 50 years so to just throw tax payer money around like a drunk elected official is unwise.