OpenStack's no science project, but does 'need to be glued together'
National Computing Infrastructure's Andrew Howard shares his experience running OpenStack at scale
Big science, big network
The old model, where “remote” simply meant the user accessed a network to log into a central resource is well and truly gone. Now, someone working with data from CERN, or astronomical data, wants to get data from a source, move it to where their application is, run the application, get results, share the results, and so on.
“The drive to commodity into networking equipment, vendors offering low-cost switches, all combined with an SDN control plane, offer significant savings,” he said, but it's the sophistication that's demonstrated the importance of SDN to the NCI.
“The real cost advantage for us is being able to provide flexibility across multiple national facilities.”
The SDN control plane means NCI's users don't need to somehow get visibility into the telecommunications carrier, he said: “We can build sophisticated internal networks for our researchers, which can transparently span across AARNet, without having to [reach] into the carrier control plane”.
The SDN capabilities supported by AARnet are important, because the capacity a researcher uses becomes a function of the application – which is oddly analogous to how the world worked in the 1990s: “go back 15 years, to the days of the ISDN PRI or ATM [primary rate interface; and asynchronous transfer mode – The Register]. You could dial up the capacity you needed on the fly.
“This is exposed as a fairly cost-effective way of doing it in software.”
From its work a decade back on building grids, “most of which have evolved into clouds”, OpenStack has provided an important part of the infrastructure that let the NCI upgrade those grids into workloads running in virtual machines.
Software stacks “take time to reach maturity”, Howard told us, but the OpenStack regular schedule of upgrades in the three years NCI's had it in production, plus input from players like Rackspace, has taken it far beyond the “science project” status.
Nonetheless, Howard says “it is a complex set of software that needs to be glued together”. While you “can't start it in the garage and use it in production straight away”, a small installation is a perfectly viable way to learn OpenStack, he said.
And for the user, the skill benefits are clear: “regardless of which virtualisation platform someone learns, those skills are easily transportable into something like VMWare with a little extra training.”
Rather than being a painfully difficult environment to use, Howard said, the NCI's attention with OpenStack has been “trying to balance which features we make available at what time, how we take advantage of those features, and how to train the researchers to take the best advantage of that environment.”
To make life as easy as possible for the researchers, a lot of effort goes into the dashboard in front of them.
“One is a typical virtual infrastructure-as-a-service platform – a standard OpenStack platform in which you spin up VMs and build applications on top of that.”
Then there are more advanced use cases, such as creating virtual laboratories that span different facilities or institutions. For the virtual laboratories, OpenStack spins up the elastic workloads for things like post-processing visualisations sitting alongside the NCI's HPC offerings.
“The researchers spin up the portal, drag-and-drop data into or out of their facility, click a box, and run their workflows. We do the provisioning – I think that's unique and different.”
“We're supporting large data flows: as a national science hub, we handle data from SKA, from CERN, Copernicus, earth sensing – it's a massive data flow we have coming in here.
“So we live and die on the network. For our users to reach us we have to have stability and really high performance.” ®