Lustre file system set for spit 'n' polish

Whamcloud lands deal from OpenSFS

Broken CD with wrench

Whamcloud, the startup created in July 2010 to continue development of the open source Lustre supercomputer file system, has secured a $2.1m contract from OpenSFS to spruce it up with new features and functions.

Lustre – used on about 60 per cent of the largest supercomputers in the world – is a parallel clustered file system designed for both supporting petabytes of files and giving high-speed access to the data stored on the file system. Lustre was created by Peter Braam when he was a researcher at Carnegie Mellon University, and was commercialized when he created Cluster File Systems in 2001.

Sun Microsystems bought CFS in 2007 to marry it with its Zettabyte File System, with the aim of going after the HPC market. However, Oracle ate Sun in 2010, and Lustre did not really have a place in Oracle's plans – conventional supercomputer clusters are not a business that Oracle is interested in. Oracle continues to offer support for Lustre 1.8, but does not offer support for the current Lustre 2.0 release.

Whamcloud was formed by a bunch of Lustre enthusiasts who hail from the US Department of Energy, which funds a lot of supercomputer research, and from Sun/Oracle where they worked on Lustre. The company received $10m in funding in September 2010, and this January added a bunch more Lustre experts from Oracle to its staff.

Even with that seed funding, Whamcloud needs to make some money, and OpenSFS is there to pass around the hat and collect both money for Lustre and feature requirements from HPC centers and vendors in the community.

Norman Morse, the CEO at OpenSFS and formerly the data center manager at Los Alamos National Laboratory (a big DOE nuke lab), tells El Reg that members of OpenSFS have to pay $500,000 in dues per year, so the initial five members have kicked in $2.5m thus far, with a total of $4m to be raked in for Lustre development within the next several months.

With money in hand and requirements building up, OpenSFS has engaged Whamcloud in a multiyear contract worth $2.1m to get to work on Lustre enhancements. The deal, says Morse, is structured so the contributors to the non-profit OpenSFS – which includes supercomputer maker Cray, HPC storage vendor Data Direct Networks, Lawrence Livermore National Laboratory, and Oak Ridge National Laboratory – will agree on and prioritize features and pay for their development as they are delivered for Lustre.

The effort will require somewhere between five and ten programmers, and the big problem area right now is the metadata server that underpins Lustre. At the moment, Lustre has only a single metadata server, which makes it a bottleneck on file serving, so the Whamcloud's Lustre programmers will parallelize the metadata server to boost its capacity and throughput. Two other features to be added to Lustre include a distributed name space and a file-system checker.

Brent Gorda, who was cutting the DOE checks for Lustre development many years ago, is Whamcloud's CEO, and he says that while Lustre is known for high performance, it's missing a lot of features, such as replication. The file system has been deployed at the big government and academic HPC labs and commercialized here and there somewhat successfully, and has its users on Wall Street, in big pharma, and in the oil and gas industry.

If you are looking for work in the IT sector, Gorda says that becoming an expert in Lustre is probably not a bad idea. "Lustre programmers and administrators are in high demand," says Gorda. "The technology has completely rebounded and people are starting to use it again." ®

Sponsored: 10 ways wire data helps conquer IT complexity