Nuke-whisperers stuff terabytes of flash into heretical 'Catalyst' super
Intel, Cray, and Lawrence Livermore rethink supercomputer design
Three technological heavyweights have come together to spin-up a radically different supercomputer cluster designed to crunch "big data" workloads rather than the simulation and modeling jobs of typical HPC rigs.
The collaboration between Lawrence Livermore National Laboratory, Intel, and Cray was announced on Monday and sees the companies fire up a high-performance computing system named 'Catalyst' that has an order of magnitude more memory than any system that has gone before it.
Catalyst has 304 dual-socket compute nodes equipped with 2.4Ghz 12-core Xeon E5-2695v2 processors backed by 128GB DRAM, along with the Intel TrueScale Fabric. So far, so super – what makes this system different is the whopping 800GB of flash memory attached via PCIe per node. Boffins want to convert slabs of solid-state storage into a secondary tier of memory.
Intel, Cray, and LLNL are going to use the system to crack "big data" problems, and in doing so investigate the way that new systems can be designed to take advantage of much faster memory mediums – a crucial investigation, given the likely arrival of some form of next-generation non-volatile RAM (such as HP's Memristor) in the next few years.
Initially, LLNL will use the system to test out a new "data intensive" technique of mapping the solid-state drives into application memory, "making the flash stores look like standard DRAM" to software, Matt Leininger, a deputy for Advanced Technology Projects within LLNL told us. Though he stressed that apps "need some smarts about what it caches in DRAM versus the flash. This machine is a way to scale that [approach] out from two to three to five nodes to several hundred."
The system works along the lines of hardware from the likes of Fusion-io, which brings faster-than-disk capacities to almost-as-fast-as-RAM memory for software to shift data around in. One area of concern is the aforementioned difference in access times between DRAM and the attached Intel flash, which will require new ways to juggle memory allocation in big apps, Leininger admitted.
So, what does all of this have to do with "big data"?
"In traditional HPC the simulation and modeling techniques are typically based on scientific models that have underlying mathematics or physics partial differential equations," Mark Seager, chief technology officer of Intel's advanced computing group, says. "That starts out with a very small amount of data and evolves over-time in a time-stepping manner generating lots and lots of data as it progresses."
"In that environment, for that type of computation, you really want to maximize floating-point operations per second per dollar that you invest in. The second most important investment there is interconnect, then memory and IO."
Small cluster, big memory
But with big-data applications where the cluster must analyze a ton of data that has been generated elsewhere and streamed in – for instance, telemetry from nationwide utility grids, or by geophysical exploration – the infrastructure demands almost reverse. Fast memory – and lots of it – becomes a priority.
"You start with a big amount of data and typically it's on disk and when you do the computation you have to figure out an efficient way to get it off the disks and into the filesystem," Seager says. "Disk is woefully slow and getting slower... NVRAM is an opportunity to get very fast random access to that data."
This approach represents a "major departure from classic simulation-based computing architectures common at US Department of Energy laboratories and opens new opportunities for exploring the potential of combining floating point focused capability with data analysis in one environment," Intel wrote in a statement announcing the system. "Consequently, the insights provided by Catalyst could become a basis for future commodity technology procurements."
Along with node getting access to 800GB of NVRAM, the system comes with dual rail Quad Data Rate (QDR-80) networking fabric, which gives each CPU its own dedicated I/O service. Previously, one socket would get the direct network link and the secondary one would have to talk across QPI.
"By having the dual rail one per socket tightly coupled we can do [stuff] with those flash devices without having to cross the QPI socket," Seager said. "We can double the effective messaging rate."
The combination of this fabric technology with the NVRAM gives Catalyst a cross-cluster bandwidth of half a terabyte per second, which is equivalent to the original incarnation of LLNL's whopping 16-petaflop "Sequoia" system which was the world's fastest HPC rig in June 2012.
The difference is the bandwidth achieved for Catalyst is "an order of magnitude less expensive because the filesystem for Sequoia is based on rotating disks," Seager said.
The full Cray CS300 cluster is capable of 150 teraflops using 304 compute nodes, 12 Lustre route nodes (128GB RAM and 3,200GB NVRAM), two login nodes (128GB DRAM), and two management nodes. Each compute node gets 800GB of NVRAM. The NVRAM comes from Intel's SSD 910 Series 800GB 1/2 height PCIe 2.0, multi-level cell flash.
Catalyst's arrival is sure to delight Jean-Luc Chatelain, an executive vice president at DataDirect Networks, who predicted to El Reg a year ago that 2014 would see the arrival of NVRAM as a major storage tier for HPC data. ®