Feeds

Show us the big data money: Isilon gets Hadooped

One-stop Hadoop/Isilon/Greenplum shop

The Essential Guide to IT Transformation

EMC is betting big on big data analytics and has integrated the Hadoop filesystem into its Isilon scale-out filer offering and enabled its Greenplum analytics product to use Hadoop data.

Hadoop is an object-style distributed and scalable open source filesystem (HDFS) implemented across a cluster of datanodes and a single NameNode, with a secondary NameNode in larger clusters to snapshot the primary NameNode's data structures and to be used as a rebuild resource if the primary NameNode fails. The NameNode contains metadata about files stored in the datanodes which serve them up on request.

HDFS is popular today in universities, especially in life sciences, as well as for some Web 2.0 applications. Part of EMC's pitch is that the NameNode is a single point of failure and there is no high-availability for it, effectively, it claims, ruling it out for enterprise data centres. The company reckons that there is a large opportunity to provide Hadoop systems for big data analytics in corporate data centres if HDFS could be made usable in the enterprise-sense and manageable by ordinary storage admins. That's what it's doing by providing an integrated Isilon-HDFS storage back-end for a Greenplum HD analytics front-end.

With the Isilon OneFS v6.5 release, EMC has provided a one-stop Apache Hadoop shop and what it sees as missing facilities in the Hadoop world, namely:

  • A sharable instead of a dedicated storage infrastructure;
  • high availability for the NameNode;
  • protection through snapshots (SnapshotIQ), replication (SyncIQ) and backup (NDMP, backup);
  • improved storage efficiency beyond the 3X data mirroring of basic HDFS to the 80 per cent level;
  • ability to scale compute and capacity separately; and
  • automated data import/export via NDS, CIFS, FTP, and HTTP

Nick Kirsch, Isilon's director of product management, said of the NameNode implementation: "This is unique. The NameNode is now part of our distributed metadata. Every node is a NameNode."

Next Greenplum has certified Apache Hadoop, provided platform management and control, and parallel analytics access with the Greenplum database. EMC is also providing design and training services, 24x7 support around the world and a roadmap for development.

EMC contrasts its approach with that of Oracle and NetApp, neither of whom, EMC claims, can provide Hadoop natively integrated with their storage arrays; full HA for the NameNode; the same level of storage efficiency; multi-protocol access; and corporate-level protection features.

Purdue University has tried out the Isilon/Hadoop combo in its statistics department and has endorsed it, saying that there is now no need for a separate Hadoop data silo and that its users now had "a single, shared storage resource for data computing and analytics". Its statisticians do more statistics and less Hadoop infrastructure management.

EMC claims these added features will make Hadoop more usable by enterprises and also that enterprise Hadoop users will increasing look to data scientists (See Wikibon description) to statistically analyse their big data sets for meaningful – and monetisable – information. After all, the ability to monetise the crunched data is the big data pay-off.

EMC Greenplum HD on Isilon is available immediately through EMC and its channel partners. ®

The Essential Guide to IT Transformation

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
Microsoft says 'weird things' can happen during Windows Server 2003 migrations
Fix coming for bug that makes Kerberos croak when you run two domain controllers
Cisco says network virtualisation won't pay off everywhere
Another sign of strain in the Borg/VMware relationship?
VVOL update: Are any vendors NOT leaping into bed with VMware?
It's not yet been released but everyone thinks it's the dog's danglies
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
The Essential Guide to IT Transformation
ServiceNow discusses three IT transformations that can help CIO's automate IT services to transform IT and the enterprise.
Maximize storage efficiency across the enterprise
The HP StoreOnce backup solution offers highly flexible, centrally managed, and highly efficient data protection for any enterprise.