NASA on track to triple Discover super's grunt
The second stage is about to fire space agency's compute power to the stars
The rolling upgrade at NASA's Centre for Climate Simulation (NCCS) is nearing completion, with the agency just about ready to flick the switch on the second of three new SGI systems.
The upgrade project, which was won by SGI (canned announcement) in November 2014, will when it's complete replace the former IBM machine – with 35,500-plus Intel Xeon cores – to more than 64,500 Haswell cores, each with 4 GB-plus of memory.
As NASA discusses here, the SGI kit will replace units dating from 2011.
The project is two-thirds complete now, with the 30,000-plus-core Scalable Compute Unit (SCU) 10 online since January and the 16,800-core SCU 11 currently undergoing system tests.
The project should be complete by the end of May 2015, when SCU 12 goes live.
NASA says SCU 10's primary project is to improve the spatial resolution of climate models in what's called the “Downscaling Project”. That project uses the SCU's 138 terabytes of memory to run the GEOS-5 model running at 12 km, with regional models down to 4 km resolution.
“Scientists are comparing how well the models predict three weather phenomena impacting the continental United States: Northeast wintertime storms, midcontinent summertime storms, and West Coast wintertime atmospheric rivers”, NASA's article says.
Each 28-core node has a 56 Gbps non-blocking InfiniBand connection, and the Discover facility's system-wide storage is being doubled to 33 petabytes.
NASA writes that the upgrade demanded “planning for nearly 1 megawatt of power and 400 tons of cooling, ensuring the vendor factory configures the racks for optimal onsite operations, and acquiring 10 nodes for the NCCS Test and Development System (TDS) to prepare the operating system (OS) and software stack (compilers, file system, MPI, etc.).”
Sensibly, the post also notes that sysadmins “scrub all NASA data off the old hardware before physically removing it”.