Square kilometre array data heading for AWS cloud

New Oz supers vaporising fast, in a good way

3 Big data security analytics techniques

Whether it's weather or Earth or whether an object exists in extra-galactic space, cloud computing disciplines are changing the face of research at facilities such as the National Computing Infrastructure (NCI) and the International Centre for Radio Astronomy Research (ICAR).

The National Computing Infrastructure, which during June tested the limits of its brand-new Fujitsu super (Raijin), is adding what it described to Vulture South as the “third apex of the triangle”, slinging $AU2m in the direction of Dell for a 3,200 core compute cloud. Dr Joseph Antony, manager of cloud, online, and data-intensive compute at the NCI, explained to The Register that the nearly 60,000-core Raijin supercomputer, and the petascale data store form the other two corners.

Dr Antony explained that the cloud node targets “workloads that don't suit” Raijin, but need access to the same data store.

“The key thing is that the cloud node can see the high-performance computational resources, as well as the petascale data store,” he explained. For example, the pre-processing and post-processing of climate data and satellite images is better offloaded, away from the supercomputer.

“For example, Australia's Bureau of Meteorology might run a large climate simulation in Raijin, deliver the output to the Lustre filesystem, and then you could have post-processing steps. The data could be ingested by a cloud workflow that would prepare it for dissemination around the world.”

Workloads not suited to Raijin, he said, are characterised by very high I/O operations – “disk-intensive, very 'seeky', very random” – in fields like bioinformatics or satellite image processing.

To make sure it can handle those kinds of loads, the NCI cloud has 150 TB of SSD storage and 56 Gbps of Mellanox Ethernet.

The Dell cloud will also allow NCI to present virtual instances of Raijin to the research community around the country. NCI director Professor Lindsay Botten said for scientists that need the Raijin software environment but not the super-high-performance backplane, Dr Antony's group will be building “an operating environment that is essentially Raijin in a box,” so that researchers will “see the same software library and the same compute environment”.

ICRAR using AWS to help design supers

Although ICRAR is taking a completely different approach to its use of the cloud, there is a common thread in the considerations: an interest in the way different workloads align with different environments.

As Professor Andreas Wicenec explained to The Register, that's going to be of critical importance in the design of the environments that crunch the numbers for the Square Kilometre Array project that is beginning to take shape with the recent switch-on of the Murchison Widefield Array in Western Australia.

Even the MWA is shipping impressively large amounts of data to the Pawsey Centre in Perth – around 400 megabytes per second and will be piling up three petabytes of data each year at Pawsey.

Some of the architectural considerations are already known: neither networks nor storage will cope with keeping every bit of raw data that flows from the SKA, so there will have to be pre-processing to decide what to keep; and at each stage – from the SKA site to the computing infrastructure to the rest of the world – people like Professor Wicenec will have to specify the architecture to best serve the project.

“That's what we're figuring out for different science cases,” he told The Register. “The baseline design is being fixed at this time, but there are all kinds of unknowns. There is a fair bit of discussion of what computing will be needed onsite, for example.

“Some of the work might involve FPGAs or ASICs, while other workloads might need GPUs or accelerator technologies.”

As an input to the detailed design work, ICRAR has also called on cloud computing – but instead of building a cloud, it's buying one, most specifically as a customer of Amazon Web Services. Part of the reason for this, according to research associate professor Kevin Vinsen, is to use AWS as the basis for various experiments that help understand where cloud architectures can help ICRAR, so that information can feed back into the design process.

“Each six months, we might be allocated a few hundred CPU hours,” professor Vinsen said – making it important not to waste that time on the wrong workloads. “Sometimes you just want to run up an experiment, then shut it down rather than waiting for our allocation.”

And the cloud environment has already demonstrated itself valuable for “things that need lots of cycles, but don't need the very low latency of Infiniband,” he added.

Some of the questions ICRAR is testing against the AWS environment include experimenting with different ways of handling images. “Some of the image files are going to be larger than 100 TB,” Vinsen explained. “You can't load them into memory, but you don't want them written back to disk.”

Their experimental work has already helped run that kind of calculation forty times faster, “by reducing the among of information going to disk”.

Other work will include tests of different approaches to image cleaning, and source finding (separating real radio sources within a given cube of space from noise or computing artefacts).

As Vinsen explained to The Australian, a $AU700 per month price tag is making AWS very attractive for this kind of work. ®

SANS - Survey on application security programs

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Kingston DataTraveler MicroDuo: Turn your phone into a 72GB beast
USB-usiness in the front, micro-USB party in the back
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
BOFH: Oh DO tell us what you think. *CLICK*
$%%&amp Oh dear, we've been cut *CLICK* Well hello *CLICK* You're breaking up...
Just what could be inside Dropbox's new 'Home For Life'?
Biz apps, messaging, photos, email, more storage – sorry, did you think there would be cake?
IT bods: How long does it take YOU to train up on new tech?
I'll leave my arrays to do the hard work, if you don't mind
Amazon reveals its Google-killing 'R3' server instances
A mega-memory instance that never forgets
Cisco reps flog Whiptail's Invicta arrays against EMC and Pure
Storage reseller report reveals who's selling what
prev story


Designing a defence for mobile apps
In this whitepaper learn the various considerations for defending mobile applications; from the mobile application architecture itself to the myriad testing technologies needed to properly assess mobile applications risk.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.