Scality commits to Big Data, puts a RING on Hadoop elephant
Also adds plug-in for OpenStack's Cinder
Object storage start-up Scality has added its storage to Hadoop so users can avoid loading data through Hadoop's own file system. It has also unveiled a plug-in for Cinder, the block storage layer within the OpenStack project.
The RING is an object storage infrastructure based on a set of X86 server nodes that store objects, not files or blocks, and can operate in parallel.
Scality has produced what it calls a "production-grade Hadoop storage implementation" using CDMI, the cloud-oriented data management standard for cloud storage developed and promoted by the SNIA. CDMI support by vendors started slowly but is picking up pace.
Scality has replaced the Hadoop Name Node server with its own metadata architecture, and thereby eliminated the single-point-of-failure in Hadoop’s architecture. The company says its Hadoop implementation of Hadoop enables in-place processing, compute on the storage node itself, and significantly reduces the need for data transfer by being able to share data location with the Job Tracker.
Scality says that its RING's erasure coding means any Hadoop hardware overhead due to replication is obviated. Also "users can write and read files through a standard file system, and at the same time process the content with Hadoop, without needing to load the files through HDFS, the Hadoop Distributed File System".
Jerome Lecat, Scality's CEO, said: "We have contributed our Hadoop solution to the CDMI community, ensuring that it can be used with any CDMI-compatible storage. … Our CDMI framework can read data directly from our scale-out file system, it is not necessary to do an HDFS ingest prior to performing a MapReduce job.”
The Scality offering is compatible with, and has been tested with, Hortonworks HDP 1.0 and Cloudera CDH4 - it doesn't appear that Scality is looking to replace or compete with existing Hadoop distributions. By adding a RING back end, as it were, Scality says it produces a more cost effective, easier-to-use, more resilient and higher performance Hadoop infrastructure, with users benefitting from Scality's SOFS (Scale-Out File System).
Lecat said: "Our angle is that we think that people will want to be able to do Hadoop job on 'normal' data, not just what they specifically prepared for Hadoop. In my mind, this is the very advantage of Hadoop, but it is killed by the fact that people need to do an HDFS ingest before any MapReduce job. Not with us anymore."
An implication is this, Lecat says: "Just imagine what you can do if you now use MapReduce – which is working on the storage nodes themselves – to do data transformation, like new encoding, as a new versions comes out. This saves a lot of processing time. It used to be necessary to move the data from storage to a server, do the transformation and then write it back on storage."
OpenStack Object Storage
Open Stack is a cloud or Infrastructure-as-a-Service (IAAS) based on free, open-source software to control pools of compute, storage and networking resources in a data centre with users self-provisioning through a portal and admin staff managing the whole caboodle through as dashboard. Rackspace and many, many other suppliers have actively and vocally supported OpenStack. Now Scality has jumped aboard the OpenStack roundabout.
Cinder is the code-name for a block storage layer in OpenStack that enables virtual machines (VMs) to discover and use persistent block volumes, and Scality has provided a RING plug-in for it. Lecat said: "This contribution enables OpenStack adopters to catch up with Amazon EBS persistent volumes for virtual machines. With the Grizzly release, OpenStack Compute will have [a] storage companion, to be deployed in high demand, cloud computing environments. It will boost the market adoption for OpenStack.”
Grizzly is the next release of OpenStack that's scheduled for release in April.
Scality is not alone. Coraid has also contributed drivers for its ATA-over-Ethernet (AoE) and Coraid EtherCloud to the OpenStack Cinder block storage open source project so OpenStackers can use its storage arrays for block storage. All-flash cloud storage array startup SolidFire SolidFire has done the same, and it has been involved on Project Cinder for several years now. Coraid claims legacy storage providers like NetApp, EMC, HP, and Dell only have partially completed functions in their OpenStack drivers, and it has joined the OpenStack community as a corporate sponsor.
The RING deal for OpenStack offers a POSIX file interface via a Scale Out File System (SOFS) package. Scality states:
The Cinder integration is built on Scality’s … distributed sparse file technology embedded in SOFS. Each Cinder volume is effectively a file inside Scality scale-out storage. This ensures easy management, seamless scalability and enables advanced virtualisation features such as live migrations of virtual machines and instant failover in case of compute node hardware failure.
Philippe Nicolas, Scality's Director of Product Strategy, said: “This block storage interface completes our Unified Storage strategy. Scality is one of the first players to actually deliver on the promise of true and complete unified storage access, including object, file and now block.”
Scality’s Cinder integration will be available with OpenStack’s Grizzly release. ®