Feeds

Scality commits to Big Data, puts a RING on Hadoop elephant

Also adds plug-in for OpenStack's Cinder

Internet Security Threat Report 2014

Object storage start-up Scality has added its storage to Hadoop so users can avoid loading data through Hadoop's own file system. It has also unveiled a plug-in for Cinder, the block storage layer within the OpenStack project.

The RING is an object storage infrastructure based on a set of X86 server nodes that store objects, not files or blocks, and can operate in parallel.

Scality has produced what it calls a "production-grade Hadoop storage implementation" using CDMI, the cloud-oriented data management standard for cloud storage developed and promoted by the SNIA. CDMI support by vendors started slowly but is picking up pace.

Scality has replaced the Hadoop Name Node server with its own metadata architecture, and thereby eliminated the single-point-of-failure in Hadoop’s architecture. The company says its Hadoop implementation of Hadoop enables in-place processing, compute on the storage node itself, and significantly reduces the need for data transfer by being able to share data location with the Job Tracker.

Scality says that its RING's erasure coding means any Hadoop hardware overhead due to replication is obviated. Also "users can write and read files through a standard file system, and at the same time process the content with Hadoop, without needing to load the files through HDFS, the Hadoop Distributed File System".

Jerome Lecat, Scality's CEO, said: "We have contributed our Hadoop solution to the CDMI community, ensuring that it can be used with any CDMI-compatible storage. … Our CDMI framework can read data directly from our scale-out file system, it is not necessary to do an HDFS ingest prior to performing a MapReduce job.”

The Scality offering is compatible with, and has been tested with, Hortonworks HDP 1.0 and Cloudera CDH4 - it doesn't appear that Scality is looking to replace or compete with existing Hadoop distributions. By adding a RING back end, as it were, Scality says it produces a more cost effective, easier-to-use, more resilient and higher performance Hadoop infrastructure, with users benefitting from Scality's SOFS (Scale-Out File System).

Lecat said: "Our angle is that we think that people will want to be able to do Hadoop job on 'normal' data, not just what they specifically prepared for Hadoop. In my mind, this is the very advantage of Hadoop, but it is killed by the fact that people need to do an HDFS ingest before any MapReduce job. Not with us anymore."

An implication is this, Lecat says: "Just imagine what you can do if you now use MapReduce – which is working on the storage nodes themselves – to do data transformation, like new encoding, as a new versions comes out. This saves a lot of processing time. It used to be necessary to move the data from storage to a server, do the transformation and then write it back on storage."

OpenStack Object Storage

Open Stack is a cloud or Infrastructure-as-a-Service (IAAS) based on free, open-source software to control pools of compute, storage and networking resources in a data centre with users self-provisioning through a portal and admin staff managing the whole caboodle through as dashboard. Rackspace and many, many other suppliers have actively and vocally supported OpenStack. Now Scality has jumped aboard the OpenStack roundabout.

Cinder is the code-name for a block storage layer in OpenStack that enables virtual machines (VMs) to discover and use persistent block volumes, and Scality has provided a RING plug-in for it. Lecat said: "This contribution enables OpenStack adopters to catch up with Amazon EBS persistent volumes for virtual machines. With the Grizzly release, OpenStack Compute will have [a] storage companion, to be deployed in high demand, cloud computing environments. It will boost the market adoption for OpenStack.”

Grizzly is the next release of OpenStack that's scheduled for release in April.

Scality is not alone. Coraid has also contributed drivers for its ATA-over-Ethernet (AoE) and Coraid EtherCloud to the OpenStack Cinder block storage open source project so OpenStackers can use its storage arrays for block storage. All-flash cloud storage array startup SolidFire SolidFire has done the same, and it has been involved on Project Cinder for several years now. Coraid claims legacy storage providers like NetApp, EMC, HP, and Dell only have partially completed functions in their OpenStack drivers, and it has joined the OpenStack community as a corporate sponsor.

The RING deal for OpenStack offers a POSIX file interface via a Scale Out File System (SOFS) package. Scality states:

The Cinder integration is built on Scality’s … distributed sparse file technology embedded in SOFS. Each Cinder volume is effectively a file inside Scality scale-out storage. This ensures easy management, seamless scalability and enables advanced virtualisation features such as live migrations of virtual machines and instant failover in case of compute node hardware failure.

Philippe Nicolas, Scality's Director of Product Strategy, said: “This block storage interface completes our Unified Storage strategy. Scality is one of the first players to actually deliver on the promise of true and complete unified storage access, including object, file and now block.”

Scality’s Cinder integration will be available with OpenStack’s Grizzly release. ®

Beginner's guide to SSL certificates

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
'Urika': Cray unveils new 1,500-core big data crunching monster
6TB of DRAM, 38TB of SSD flash and 120TB of disk storage
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
SDI wars: WTF is software defined infrastructure?
This time we play for ALL the marbles
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Oracle hires former SAP exec for cloudy push
'We know Larry said cloud was gibberish, and insane, and idiotic, but...'
Symantec backs out of Backup Exec: Plans to can appliance in Jan
Will still provide support to existing customers
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
Win a year’s supply of chocolate
There is no techie angle to this competition so we're not going to pretend there is, but everyone loves chocolate so who cares.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.