OpenSFS to fund Lustre HPC file system development
If anyone forks the code, it will be Oracle
Over half of the top HPC centres in the world are using the Lustre file system, so the latest move to fund its development should make a lot of people happy.
And there will be a lot of brainpower and resources backing up the move: namely, supercomputer maker Cray, HPC storage vendor Data Direct Networks, and two nuke labs funded by the US Department of Energy - the Lawrence Livermore National Laboratory and Oak Ridge National Laboratory.
Cray, Data Direct Networks and the Lawrence Livermore and Oak Ridgehave labs have pooled resources to create Open Scalable File Systems. OpenSFS, a non-profit based in California, will fund the development of the Lustre parallel file system.
The Lustre file system is an open source project under the control of Oracle, if any open source project can be said to be under anyone's control. Of the top 100 HPC centers in the world, 60 of them are using the Lustre file system to feed data to their supers, according to Galen Shipment, technology integration group leader at Oak Ridge and a board member of OpenSFS.
The problem is not that Oracle is not continuing development on Lustre - the company put the finishing touches on Lustre 2.0 back in August - but rather that Oracle is not as interested in the HPC market as Sun Microsystems - and it has no intention of offering commercial-grade support for Lustre 2.0. Nevertheless, it is still offering support for the earlier Lustre 1.8 release. This is a big problem for HPC labs.
You might think that the big nuke labs already have the smartest people in the world working for them and that, if anything, it should be they who would offer tech support to the rest of the HPC community for products such as Lustre. I certainly thought that, based on the raw brainpower at these labs. Also, Lawrence Livermore was where a lot of the original Lustre file system work was funded out of DOE and put into production to put it through the paces. But Mark Seager, Lawrence Livermore's assistant department head for advanced technology - and an OpenSFS board member - said this is not what Lustre customers want.
While the hotshot HPC shops using Lustre are able to handle level one tech support fine, and can even wade in and offer level two support to offer bug fixes for the easy stuff, that's about as far as it goes.
"We need that third level of support backing us up for deep problems," explains Seager. "We do not have the manpower to do that deep support."
The need for deep and official support, and a product roadmap and development process that could accept the input and requirements from all the customers and shape future releases to address their needs, was why Peter Braam, a researcher at Carnegie Mellon University and founder of the Lustre project, created Cluster File Systems in 2001. Sun Microsystems bought CFS in 2007, and Oracle ate Sun in January.
The situation around Lustre was murky enough that Whamcloud, a startup founded with $10m in private equity funding and some of the top people involved with the development of Lustre, burst on the scene at the end of July to chase Lustre support contracts and do development on the parallel file system and submit code back into the Oracle-controlled Lustre code base.