Feeds

Scality commits to Big Data, puts a RING on Hadoop elephant

Also adds plug-in for OpenStack's Cinder

Next gen security for virtualised datacentres

Object storage start-up Scality has added its storage to Hadoop so users can avoid loading data through Hadoop's own file system. It has also unveiled a plug-in for Cinder, the block storage layer within the OpenStack project.

The RING is an object storage infrastructure based on a set of X86 server nodes that store objects, not files or blocks, and can operate in parallel.

Scality has produced what it calls a "production-grade Hadoop storage implementation" using CDMI, the cloud-oriented data management standard for cloud storage developed and promoted by the SNIA. CDMI support by vendors started slowly but is picking up pace.

Scality has replaced the Hadoop Name Node server with its own metadata architecture, and thereby eliminated the single-point-of-failure in Hadoop’s architecture. The company says its Hadoop implementation of Hadoop enables in-place processing, compute on the storage node itself, and significantly reduces the need for data transfer by being able to share data location with the Job Tracker.

Scality says that its RING's erasure coding means any Hadoop hardware overhead due to replication is obviated. Also "users can write and read files through a standard file system, and at the same time process the content with Hadoop, without needing to load the files through HDFS, the Hadoop Distributed File System".

Jerome Lecat, Scality's CEO, said: "We have contributed our Hadoop solution to the CDMI community, ensuring that it can be used with any CDMI-compatible storage. … Our CDMI framework can read data directly from our scale-out file system, it is not necessary to do an HDFS ingest prior to performing a MapReduce job.”

The Scality offering is compatible with, and has been tested with, Hortonworks HDP 1.0 and Cloudera CDH4 - it doesn't appear that Scality is looking to replace or compete with existing Hadoop distributions. By adding a RING back end, as it were, Scality says it produces a more cost effective, easier-to-use, more resilient and higher performance Hadoop infrastructure, with users benefitting from Scality's SOFS (Scale-Out File System).

Lecat said: "Our angle is that we think that people will want to be able to do Hadoop job on 'normal' data, not just what they specifically prepared for Hadoop. In my mind, this is the very advantage of Hadoop, but it is killed by the fact that people need to do an HDFS ingest before any MapReduce job. Not with us anymore."

An implication is this, Lecat says: "Just imagine what you can do if you now use MapReduce – which is working on the storage nodes themselves – to do data transformation, like new encoding, as a new versions comes out. This saves a lot of processing time. It used to be necessary to move the data from storage to a server, do the transformation and then write it back on storage."

OpenStack Object Storage

Open Stack is a cloud or Infrastructure-as-a-Service (IAAS) based on free, open-source software to control pools of compute, storage and networking resources in a data centre with users self-provisioning through a portal and admin staff managing the whole caboodle through as dashboard. Rackspace and many, many other suppliers have actively and vocally supported OpenStack. Now Scality has jumped aboard the OpenStack roundabout.

Cinder is the code-name for a block storage layer in OpenStack that enables virtual machines (VMs) to discover and use persistent block volumes, and Scality has provided a RING plug-in for it. Lecat said: "This contribution enables OpenStack adopters to catch up with Amazon EBS persistent volumes for virtual machines. With the Grizzly release, OpenStack Compute will have [a] storage companion, to be deployed in high demand, cloud computing environments. It will boost the market adoption for OpenStack.”

Grizzly is the next release of OpenStack that's scheduled for release in April.

Scality is not alone. Coraid has also contributed drivers for its ATA-over-Ethernet (AoE) and Coraid EtherCloud to the OpenStack Cinder block storage open source project so OpenStackers can use its storage arrays for block storage. All-flash cloud storage array startup SolidFire SolidFire has done the same, and it has been involved on Project Cinder for several years now. Coraid claims legacy storage providers like NetApp, EMC, HP, and Dell only have partially completed functions in their OpenStack drivers, and it has joined the OpenStack community as a corporate sponsor.

The RING deal for OpenStack offers a POSIX file interface via a Scale Out File System (SOFS) package. Scality states:

The Cinder integration is built on Scality’s … distributed sparse file technology embedded in SOFS. Each Cinder volume is effectively a file inside Scality scale-out storage. This ensures easy management, seamless scalability and enables advanced virtualisation features such as live migrations of virtual machines and instant failover in case of compute node hardware failure.

Philippe Nicolas, Scality's Director of Product Strategy, said: “This block storage interface completes our Unified Storage strategy. Scality is one of the first players to actually deliver on the promise of true and complete unified storage access, including object, file and now block.”

Scality’s Cinder integration will be available with OpenStack’s Grizzly release. ®

5 things you didn’t know about cloud backup

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Oracle reveals 32-core, 10 BEEELLION-transistor SPARC M7
New chip scales to 1024 cores, 8192 threads 64 TB RAM, at speeds over 3.6GHz
Docker kicks KVM's butt in IBM tests
Big Blue finds containers are speedy, but may not have much room to improve
US regulators OK sale of IBM's x86 server biz to Lenovo
Now all that remains is for gov't offices to ban the boxes
Gartner's Special Report: Should you believe the hype?
Enough hot air to carry a balloon to the Moon
Flash could be CHEAPER than SAS DISK? Come off it, NetApp
Stats analysis reckons we'll hit that point in just three years
Dell The Man shrieks: 'We've got a Bitcoin order, we've got a Bitcoin order'
$50k of PowerEdge servers? That'll be 85 coins in digi-dosh
prev story

Whitepapers

Endpoint data privacy in the cloud is easier than you think
Innovations in encryption and storage resolve issues of data privacy and key requirements for companies to look for in a solution.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Solving today's distributed Big Data backup challenges
Enable IT efficiency and allow a firm to access and reuse corporate information for competitive advantage, ultimately changing business outcomes.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.