CERN publishes massive data set
Take a bite of some seriously big data
The CMS Collaboration at CERN has dropped its biggest data publication ever: more than 300 terabytes of particle collisions and accompanying analysis.
Unless you've got access to a mighty broadband account, forget this one: at 25Mbps download speed, you'll need a little over three years to have the full dataset. And that's before you buy sufficient enough disks for the job (by our quick calculation, on the upside of US$8,000 worth).
Universities, however, do have the kinds of connections they need, since the spread of research networks across the world means 100Gbps links are not uncommon. Even AARNET in distant Australia has a 40Gbps trans-Pacific link to America.
CERN explains the data includes “half the data collected at the LHC by the CMS detector in 2011”.
“CMS is also providing the simulated data generated with the same software version that should be used to analyse the primary datasets,” its announcement says, along with the protocols for generating the simulations, “analysis tools and code examples tailored to the datasets.
“A virtual-machine image based on CernVM, which comes preloaded with the software environment needed to analyse the CMS data, can also be downloaded from the portal,” the announcement adds.
Of the 300TB or so of data, CMS says there's 100TB of proton collision data at 7 TeV (tera electron-volts), and for the physicists, that's 2.5 inverse femtobarns.
As well as the primary datasets, there are what CMS calls “derived datasets” that target university and high-school students. ®