Feeds

Show us the big data money: Isilon gets Hadooped

One-stop Hadoop/Isilon/Greenplum shop

Next gen security for virtualised datacentres

EMC is betting big on big data analytics and has integrated the Hadoop filesystem into its Isilon scale-out filer offering and enabled its Greenplum analytics product to use Hadoop data.

Hadoop is an object-style distributed and scalable open source filesystem (HDFS) implemented across a cluster of datanodes and a single NameNode, with a secondary NameNode in larger clusters to snapshot the primary NameNode's data structures and to be used as a rebuild resource if the primary NameNode fails. The NameNode contains metadata about files stored in the datanodes which serve them up on request.

HDFS is popular today in universities, especially in life sciences, as well as for some Web 2.0 applications. Part of EMC's pitch is that the NameNode is a single point of failure and there is no high-availability for it, effectively, it claims, ruling it out for enterprise data centres. The company reckons that there is a large opportunity to provide Hadoop systems for big data analytics in corporate data centres if HDFS could be made usable in the enterprise-sense and manageable by ordinary storage admins. That's what it's doing by providing an integrated Isilon-HDFS storage back-end for a Greenplum HD analytics front-end.

With the Isilon OneFS v6.5 release, EMC has provided a one-stop Apache Hadoop shop and what it sees as missing facilities in the Hadoop world, namely:

  • A sharable instead of a dedicated storage infrastructure;
  • high availability for the NameNode;
  • protection through snapshots (SnapshotIQ), replication (SyncIQ) and backup (NDMP, backup);
  • improved storage efficiency beyond the 3X data mirroring of basic HDFS to the 80 per cent level;
  • ability to scale compute and capacity separately; and
  • automated data import/export via NDS, CIFS, FTP, and HTTP

Nick Kirsch, Isilon's director of product management, said of the NameNode implementation: "This is unique. The NameNode is now part of our distributed metadata. Every node is a NameNode."

Next Greenplum has certified Apache Hadoop, provided platform management and control, and parallel analytics access with the Greenplum database. EMC is also providing design and training services, 24x7 support around the world and a roadmap for development.

EMC contrasts its approach with that of Oracle and NetApp, neither of whom, EMC claims, can provide Hadoop natively integrated with their storage arrays; full HA for the NameNode; the same level of storage efficiency; multi-protocol access; and corporate-level protection features.

Purdue University has tried out the Isilon/Hadoop combo in its statistics department and has endorsed it, saying that there is now no need for a separate Hadoop data silo and that its users now had "a single, shared storage resource for data computing and analytics". Its statisticians do more statistics and less Hadoop infrastructure management.

EMC claims these added features will make Hadoop more usable by enterprises and also that enterprise Hadoop users will increasing look to data scientists (See Wikibon description) to statistically analyse their big data sets for meaningful – and monetisable – information. After all, the ability to monetise the crunched data is the big data pay-off.

EMC Greenplum HD on Isilon is available immediately through EMC and its channel partners. ®

5 things you didn’t know about cloud backup

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Oracle reveals 32-core, 10 BEEELLION-transistor SPARC M7
New chip scales to 1024 cores, 8192 threads 64 TB RAM, at speeds over 3.6GHz
Microsoft: Azure isn't ready for biz-critical apps … yet
Microsoft will move its own IT to the cloud to avoid $200m server bill
Docker kicks KVM's butt in IBM tests
Big Blue finds containers are speedy, but may not have much room to improve
US regulators OK sale of IBM's x86 server biz to Lenovo
Now all that remains is for gov't offices to ban the boxes
Gartner's Special Report: Should you believe the hype?
Enough hot air to carry a balloon to the Moon
Flash could be CHEAPER than SAS DISK? Come off it, NetApp
Stats analysis reckons we'll hit that point in just three years
Dell The Man shrieks: 'We've got a Bitcoin order, we've got a Bitcoin order'
$50k of PowerEdge servers? That'll be 85 coins in digi-dosh
prev story

Whitepapers

Endpoint data privacy in the cloud is easier than you think
Innovations in encryption and storage resolve issues of data privacy and key requirements for companies to look for in a solution.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Solving today's distributed Big Data backup challenges
Enable IT efficiency and allow a firm to access and reuse corporate information for competitive advantage, ultimately changing business outcomes.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.