Square kilometre array data heading for AWS cloud
New Oz supers vaporising fast, in a good way
Whether it's weather or Earth or whether an object exists in extra-galactic space, cloud computing disciplines are changing the face of research at facilities such as the National Computing Infrastructure (NCI) and the International Centre for Radio Astronomy Research (ICAR).
The National Computing Infrastructure, which during June tested the limits of its brand-new Fujitsu super (Raijin), is adding what it described to Vulture South as the “third apex of the triangle”, slinging $AU2m in the direction of Dell for a 3,200 core compute cloud. Dr Joseph Antony, manager of cloud, online, and data-intensive compute at the NCI, explained to The Register that the nearly 60,000-core Raijin supercomputer, and the petascale data store form the other two corners.
Dr Antony explained that the cloud node targets “workloads that don't suit” Raijin, but need access to the same data store.
“The key thing is that the cloud node can see the high-performance computational resources, as well as the petascale data store,” he explained. For example, the pre-processing and post-processing of climate data and satellite images is better offloaded, away from the supercomputer.
“For example, Australia's Bureau of Meteorology might run a large climate simulation in Raijin, deliver the output to the Lustre filesystem, and then you could have post-processing steps. The data could be ingested by a cloud workflow that would prepare it for dissemination around the world.”
Workloads not suited to Raijin, he said, are characterised by very high I/O operations – “disk-intensive, very 'seeky', very random” – in fields like bioinformatics or satellite image processing.
To make sure it can handle those kinds of loads, the NCI cloud has 150 TB of SSD storage and 56 Gbps of Mellanox Ethernet.
The Dell cloud will also allow NCI to present virtual instances of Raijin to the research community around the country. NCI director Professor Lindsay Botten said for scientists that need the Raijin software environment but not the super-high-performance backplane, Dr Antony's group will be building “an operating environment that is essentially Raijin in a box,” so that researchers will “see the same software library and the same compute environment”.
ICRAR using AWS to help design supers
Although ICRAR is taking a completely different approach to its use of the cloud, there is a common thread in the considerations: an interest in the way different workloads align with different environments.
As Professor Andreas Wicenec explained to The Register, that's going to be of critical importance in the design of the environments that crunch the numbers for the Square Kilometre Array project that is beginning to take shape with the recent switch-on of the Murchison Widefield Array in Western Australia.
Even the MWA is shipping impressively large amounts of data to the Pawsey Centre in Perth – around 400 megabytes per second and will be piling up three petabytes of data each year at Pawsey.
Some of the architectural considerations are already known: neither networks nor storage will cope with keeping every bit of raw data that flows from the SKA, so there will have to be pre-processing to decide what to keep; and at each stage – from the SKA site to the computing infrastructure to the rest of the world – people like Professor Wicenec will have to specify the architecture to best serve the project.
“That's what we're figuring out for different science cases,” he told The Register. “The baseline design is being fixed at this time, but there are all kinds of unknowns. There is a fair bit of discussion of what computing will be needed onsite, for example.
“Some of the work might involve FPGAs or ASICs, while other workloads might need GPUs or accelerator technologies.”
As an input to the detailed design work, ICRAR has also called on cloud computing – but instead of building a cloud, it's buying one, most specifically as a customer of Amazon Web Services. Part of the reason for this, according to research associate professor Kevin Vinsen, is to use AWS as the basis for various experiments that help understand where cloud architectures can help ICRAR, so that information can feed back into the design process.
“Each six months, we might be allocated a few hundred CPU hours,” professor Vinsen said – making it important not to waste that time on the wrong workloads. “Sometimes you just want to run up an experiment, then shut it down rather than waiting for our allocation.”
And the cloud environment has already demonstrated itself valuable for “things that need lots of cycles, but don't need the very low latency of Infiniband,” he added.
Some of the questions ICRAR is testing against the AWS environment include experimenting with different ways of handling images. “Some of the image files are going to be larger than 100 TB,” Vinsen explained. “You can't load them into memory, but you don't want them written back to disk.”
Their experimental work has already helped run that kind of calculation forty times faster, “by reducing the among of information going to disk”.
Other work will include tests of different approaches to image cleaning, and source finding (separating real radio sources within a given cube of space from noise or computing artefacts).
As Vinsen explained to The Australian, a $AU700 per month price tag is making AWS very attractive for this kind of work. ®