Square kilometre array data heading for AWS cloud

New Oz supers vaporising fast, in a good way

Security for virtualized datacentres

Whether it's weather or Earth or whether an object exists in extra-galactic space, cloud computing disciplines are changing the face of research at facilities such as the National Computing Infrastructure (NCI) and the International Centre for Radio Astronomy Research (ICAR).

The National Computing Infrastructure, which during June tested the limits of its brand-new Fujitsu super (Raijin), is adding what it described to Vulture South as the “third apex of the triangle”, slinging $AU2m in the direction of Dell for a 3,200 core compute cloud. Dr Joseph Antony, manager of cloud, online, and data-intensive compute at the NCI, explained to The Register that the nearly 60,000-core Raijin supercomputer, and the petascale data store form the other two corners.

Dr Antony explained that the cloud node targets “workloads that don't suit” Raijin, but need access to the same data store.

“The key thing is that the cloud node can see the high-performance computational resources, as well as the petascale data store,” he explained. For example, the pre-processing and post-processing of climate data and satellite images is better offloaded, away from the supercomputer.

“For example, Australia's Bureau of Meteorology might run a large climate simulation in Raijin, deliver the output to the Lustre filesystem, and then you could have post-processing steps. The data could be ingested by a cloud workflow that would prepare it for dissemination around the world.”

Workloads not suited to Raijin, he said, are characterised by very high I/O operations – “disk-intensive, very 'seeky', very random” – in fields like bioinformatics or satellite image processing.

To make sure it can handle those kinds of loads, the NCI cloud has 150 TB of SSD storage and 56 Gbps of Mellanox Ethernet.

The Dell cloud will also allow NCI to present virtual instances of Raijin to the research community around the country. NCI director Professor Lindsay Botten said for scientists that need the Raijin software environment but not the super-high-performance backplane, Dr Antony's group will be building “an operating environment that is essentially Raijin in a box,” so that researchers will “see the same software library and the same compute environment”.

ICRAR using AWS to help design supers

Although ICRAR is taking a completely different approach to its use of the cloud, there is a common thread in the considerations: an interest in the way different workloads align with different environments.

As Professor Andreas Wicenec explained to The Register, that's going to be of critical importance in the design of the environments that crunch the numbers for the Square Kilometre Array project that is beginning to take shape with the recent switch-on of the Murchison Widefield Array in Western Australia.

Even the MWA is shipping impressively large amounts of data to the Pawsey Centre in Perth – around 400 megabytes per second and will be piling up three petabytes of data each year at Pawsey.

Some of the architectural considerations are already known: neither networks nor storage will cope with keeping every bit of raw data that flows from the SKA, so there will have to be pre-processing to decide what to keep; and at each stage – from the SKA site to the computing infrastructure to the rest of the world – people like Professor Wicenec will have to specify the architecture to best serve the project.

“That's what we're figuring out for different science cases,” he told The Register. “The baseline design is being fixed at this time, but there are all kinds of unknowns. There is a fair bit of discussion of what computing will be needed onsite, for example.

“Some of the work might involve FPGAs or ASICs, while other workloads might need GPUs or accelerator technologies.”

As an input to the detailed design work, ICRAR has also called on cloud computing – but instead of building a cloud, it's buying one, most specifically as a customer of Amazon Web Services. Part of the reason for this, according to research associate professor Kevin Vinsen, is to use AWS as the basis for various experiments that help understand where cloud architectures can help ICRAR, so that information can feed back into the design process.

“Each six months, we might be allocated a few hundred CPU hours,” professor Vinsen said – making it important not to waste that time on the wrong workloads. “Sometimes you just want to run up an experiment, then shut it down rather than waiting for our allocation.”

And the cloud environment has already demonstrated itself valuable for “things that need lots of cycles, but don't need the very low latency of Infiniband,” he added.

Some of the questions ICRAR is testing against the AWS environment include experimenting with different ways of handling images. “Some of the image files are going to be larger than 100 TB,” Vinsen explained. “You can't load them into memory, but you don't want them written back to disk.”

Their experimental work has already helped run that kind of calculation forty times faster, “by reducing the among of information going to disk”.

Other work will include tests of different approaches to image cleaning, and source finding (separating real radio sources within a given cube of space from noise or computing artefacts).

As Vinsen explained to The Australian, a $AU700 per month price tag is making AWS very attractive for this kind of work. ®

Providing a secure and efficient Helpdesk

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
IBM storage revenues sink: 'We are disappointed,' says CEO
Time to put the storage biz up for sale?
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Symantec backs out of Backup Exec: Plans to can appliance in Jan
Will still provide support to existing customers
prev story


Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.