Feeds

Show us the big data money: Isilon gets Hadooped

One-stop Hadoop/Isilon/Greenplum shop

Remote control for virtualized desktops

EMC is betting big on big data analytics and has integrated the Hadoop filesystem into its Isilon scale-out filer offering and enabled its Greenplum analytics product to use Hadoop data.

Hadoop is an object-style distributed and scalable open source filesystem (HDFS) implemented across a cluster of datanodes and a single NameNode, with a secondary NameNode in larger clusters to snapshot the primary NameNode's data structures and to be used as a rebuild resource if the primary NameNode fails. The NameNode contains metadata about files stored in the datanodes which serve them up on request.

HDFS is popular today in universities, especially in life sciences, as well as for some Web 2.0 applications. Part of EMC's pitch is that the NameNode is a single point of failure and there is no high-availability for it, effectively, it claims, ruling it out for enterprise data centres. The company reckons that there is a large opportunity to provide Hadoop systems for big data analytics in corporate data centres if HDFS could be made usable in the enterprise-sense and manageable by ordinary storage admins. That's what it's doing by providing an integrated Isilon-HDFS storage back-end for a Greenplum HD analytics front-end.

With the Isilon OneFS v6.5 release, EMC has provided a one-stop Apache Hadoop shop and what it sees as missing facilities in the Hadoop world, namely:

  • A sharable instead of a dedicated storage infrastructure;
  • high availability for the NameNode;
  • protection through snapshots (SnapshotIQ), replication (SyncIQ) and backup (NDMP, backup);
  • improved storage efficiency beyond the 3X data mirroring of basic HDFS to the 80 per cent level;
  • ability to scale compute and capacity separately; and
  • automated data import/export via NDS, CIFS, FTP, and HTTP

Nick Kirsch, Isilon's director of product management, said of the NameNode implementation: "This is unique. The NameNode is now part of our distributed metadata. Every node is a NameNode."

Next Greenplum has certified Apache Hadoop, provided platform management and control, and parallel analytics access with the Greenplum database. EMC is also providing design and training services, 24x7 support around the world and a roadmap for development.

EMC contrasts its approach with that of Oracle and NetApp, neither of whom, EMC claims, can provide Hadoop natively integrated with their storage arrays; full HA for the NameNode; the same level of storage efficiency; multi-protocol access; and corporate-level protection features.

Purdue University has tried out the Isilon/Hadoop combo in its statistics department and has endorsed it, saying that there is now no need for a separate Hadoop data silo and that its users now had "a single, shared storage resource for data computing and analytics". Its statisticians do more statistics and less Hadoop infrastructure management.

EMC claims these added features will make Hadoop more usable by enterprises and also that enterprise Hadoop users will increasing look to data scientists (See Wikibon description) to statistically analyse their big data sets for meaningful – and monetisable – information. After all, the ability to monetise the crunched data is the big data pay-off.

EMC Greenplum HD on Isilon is available immediately through EMC and its channel partners. ®

Beginner's guide to SSL certificates

More from The Register

next story
The cloud that goes puff: Seagate Central home NAS woes
4TB of home storage is great, until you wake up to a dead device
Azure TITSUP caused by INFINITE LOOP
Fat fingered geo-block kept Aussies in the dark
You think the CLOUD's insecure? It's BETTER than UK.GOV's DATA CENTRES
We don't even know where some of them ARE – Maude
Want to STUFF Facebook with blatant ADVERTISING? Fine! But you must PAY
Pony up or push off, Zuck tells social marketeers
Oi, Europe! Tell US feds to GTFO of our servers, say Microsoft and pals
By writing a really angry letter about how it's harming our cloud business, ta
BOFH: WHERE did this 'fax-enabled' printer UPGRADE come from?
Don't worry about that cable, it's part of the config
Intel offers ingenious piece of 10TB 3D NAND chippery
The race for next generation flash capacity now on
prev story

Whitepapers

Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Mitigating web security risk with SSL certificates
Web-based systems are essential tools for running business processes and delivering services to customers.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.