Hurry up, EU: CERN boffins need clouds to hunt Higgs boson
Some of the LHC data has already gone cloudy
CCWF2012 Physics boffinry centre extraordinaire CERN would love to be processing its reams of research data in the cloud, if only Europe would hurry up with a regulatory framework.
Bob Jones, head of openlab at CERN, said that not having standards and a legal framework in place that applies across Europe is holding up its move to the cloud.
“Things we can do in the UK, we can’t do in Germany, which is a bit of a barrier,” Jones told The Register at the Cloud Computing World Forum in London.
CERN has 20 members, all European countries, and its data sits in 130 data centres across the world. The Large Hadron Collider and its attached experiments generate information in mind-boggling quantities, which all has to be processed so boffins can make their findings public.
The four major experiments at LHC – Atlas, LHCb, ALICE (A Large Ion Collider Experiment) and Compact Muon Solenoid – generate 1 petabyte of raw data per second, Jones said.
“We don’t record all that. Ninety-nine per cent is data we’ve seen before and we throw it away; the [other] 1 per cent and knowing which 1 per cent [it is] is IT’s job,” he added.
Despite binning so much of the data, CERN still ends up with around 15 million GB of data a year, which requires computing power equivalent to that of around 100,000 of today’s fastest computers to be processed. That data is stored at CERN’s core 3MW data centre, soon to be supplemented by another centre of similar size the organisation is building in Hungary.
Jones said CERN has to move with the times and see what cloud can do to handle its big data. The research body is currently in its first year of a two-year pilot phase testing European cloud computing platform Helix Nebula - aka "The Science Cloud".
CERN, the European Space Agency and the European Molecular Biology Lab are working together to use Helix Nebula as a cloudy testing ground. “At the moment, we’re looking more at infrastructure as a service,” Jones said, “starting at the bottom and working our way up.”
At the moment, all of CERN’s data centres are publically funded, so it wants to see if it can start using commercial data centres for additional capacity.
The pilot phase will run until the end of 2014 and it will be trying to assess a whole host of issues that moving science into the cloud could cause.
“We will be looking at security, reliability, data privacy, scalability, network performance, integration, vendor lock-in, legal concerns and transparency,” Jones said.
“If that works, we want people to tender at the end of the pilot phase.”
CERN has put part of its Atlas experiment in the cloud as part of the pilot, while the ESA is doing some of its Earth observation and the European Molecular Biology lab is looking at genomic assembly using Helix.
As well as the general issues with any move to the cloud, there are also some issues specific to CERN.
“Each experiment at LHC isn’t obliged to use all 130 data centres; they have their favourites,” Jones said. “We need to make sure that data placement is still possible.”
Every time the LHC shuts down, which it is scheduled to do for 18 months from 2013, is an opportunity for upgrades in both the science and the tech.
“In 2013, we’ll be doing upgrades around Oracle, Intel MIC architecture and the network,” Jones said.
“At the same time, we’ll be looking towards the next shutdown in 2017 or 2018 and we’re working now with companies like Oracle and Intel on new techs we could use in the front end... for example, silicon photonics.”
Unlike most folks who want to go into the cloud, CERN isn’t looking so much at the bottom line.
“When you already have large data centres and sustained need for capacity, you might not be seeing that much saving,” Jones pointed out.
But there are other benefits for CERN. As well as the obvious – access to more capacity – a move to the cloud could also help speed up procurement processes, which can take up to two years right now. And CERN is part of the public sector, which has to try to come up with a good return on investment.
While Helix Nebula is in its pilot phase, Jones is hoping to find more scientific organisations that want to join forces in the cloud, as well as more suppliers for healthy competition.
Jones anticipates the LHC will be running for 20 years, but he wouldn’t say whether he thought that was enough time to find the Higgs Boson particle.
“That’s not all CERN does!” he laughed.
“But if it’s not discovered, that would be interesting; it would mean that the standard model’s wrong so we’ll have to come up with a new standard model!”
Whether LHC finds Higgs Boson or not, it will still have done an important job.
“We need to know where the physics models are going to see what physics tools we’ll need next,” he said. ®
Sponsored: Data Loss Prevention & Data Theft Prevention