CERN's boson hunters tackle big data bug infestation
It's the software or the science that's been wrong
Tens of thousands of bugs have been eliminated from the program CERN's atom-smashers are using to identify Higgs boson – just don't expect an answer to life the universe and everytime anytime soon.
CERN says it has squashed 40,000 bugs living in ROOT, the C++ framework it is relied upon to store, crunch and help analyse petabytes of data from the Large Hadron Collider (LHC). The massive collider generates 15PB of data each year from 600 million proton collisions per second.
ROOT contains 3.5 million lines of code while CERN's army of 10,000 physicists have surrounded that core with a further 50 million lines of software they have built to try and sift out Higgs boson from the petabytes. Higgs boson is the particle that theoretically gives mass to all other particles, but it has to date proved elusive.
The bugs have lived in ROOT since the data-munching framework came online in 1995, and were only finally winkled out using the application of commercially available static-code analysis tools from development testing specialist Coverity.
CERN reckons the bugs had helped muddy results from the LHC, throwing them off the Higgs-boson scent. Further, there were programs built by those 10,000 scientists that could never be properly tested prior to Coverity.
CERN and its physicists had relied on a various testing and in-house tools for 16 years, including unit-testing to identify bugs. Axel Naumann, a member of the CERN ROOT development team, told us the tools weren't up to scratch. They generated too many false positives, might not generate enough warnings, had lacked the features to help zero in on bugs – because they generated pages of reports to wade through. The existing tools also relied on CERN's six-person tech team working on ROOT having to re-create the exact conditions that had produced a problem – a near impossible task.
Naumann told us: "A lot if people have relied on their unit test – they believe this makes them safe. It does but it’s not enough. We’d done unit testing; we do tests all the way – we need to quantify the number of bugs and know the effect they will have on our results."
Do does this now mean ROOT is bug-free? "Software doesn't work that way!" Naumann says with a big laugh. "There’s always something that’s hiding."
Big question, then: is CERN now closer to actually finding Higgs boson? Another big laugh. "You know, nature is still allowed to play tricks on us. We were hoping it wold be easier. Maybe we were unlucky, or maybe we were lucky and we just need to find new physics to identify it!"
That would be a "not necessarily" then. ®
So they didn't even use LINT. My experience of scientist's code compared to professional programmers code is that the scientists code is extremely sloppy and slap dash. The equivalent of a someone building their first house and a professional builder building their umpteenth house.
I suppose it wouldn't be the C++?
My experience of C++ -- which goes back to the early 90s -- is that's its a very powerful tool that's responsible for pretty much every large scale screwup in modern software design (plus the inevitable software bloat). I describe giving a typical programmer this tool as "a bit like giving a toddler a chainsaw as a Christmas present". I also think that object methodology is seriously overused; its all that gets taught so we're stuck with the "if all you've known is a hammer then everything looks like a nail".
Now, rather than making the typical programmer statement "its buggy because its got x million lines of code in it" we should be asking why its so big, why it doesn't break down into testable components and so on. Ordinary, everyday stuff that I will admit seems to be elusive to our Windows bretheren (Microsoft doesn't go out of their way to make their stuff easy to work with, IMHO) but absolutely essential if you're doing serious work such as embedded design.
I find professional programmers -- CS majors -- among the worst offenders because they only know their coding abstractions, they see the code as the goal rather than it being a model of some thing or process.
I'm tired of reading
That C++ is dangerous that C++ is like a chainsaw, etc.
Any language can engender a bloody mess, or a fine piece of digital crafting.
Being it VB6, .NOT, Delphi, Perl, Bash,or C/C++
I have seen software being written in VB6 which was both well designed and elegant, and I have seen true C++ abortions of nature, and the other way around.
It is the developer who makes the difference here.