Seagate connects Hadoop and Lustre in an open sourcery ceremony
Streamlining workflows, and easier processing for Hadoop-using apps
Seagate has written a Hadoop connector for Lustre, meaning Hadoop-using systems can now fetch data from a Lustre parallel file system array, as part of a small contribution by the US data storage company to an open source world.
The Hadoop on Linux Connector (HoLC) means that data stored on a Lustre system doesn’t need copying from that data store to an HDFS store before Hadoop-using applications can process it.
Hadoop tools such as Mahout, Hive and Pig can use a Lustre filesystem.
Seagate is releasing patch source code for Hadoop that enables diskless Hadoop clusters to access data on a Lustre HPC-style data store. Overall, Seagate claims HoLC can streamline Hadoop workflows.
It is now transferring assets relating to Lustre.org to OpenSFS (Open Scalable File Systems) and EOFS (European Open Filesystem SCE), arguing these two are trusted stewards of the Lustre community.
Seagate is far from walking away as it contributes to OpenSFS at the highest ‘Promoter’ level, and still sits on its board.
Another example, it says, of its open source credentials was making its Ethernet Drive (Kinetic) interface specification and T-Card developer adapter available to the Open Compute Project in January this year. ®