Feeds

Show us the big data money: Isilon gets Hadooped

One-stop Hadoop/Isilon/Greenplum shop

Internet Security Threat Report 2014

EMC is betting big on big data analytics and has integrated the Hadoop filesystem into its Isilon scale-out filer offering and enabled its Greenplum analytics product to use Hadoop data.

Hadoop is an object-style distributed and scalable open source filesystem (HDFS) implemented across a cluster of datanodes and a single NameNode, with a secondary NameNode in larger clusters to snapshot the primary NameNode's data structures and to be used as a rebuild resource if the primary NameNode fails. The NameNode contains metadata about files stored in the datanodes which serve them up on request.

HDFS is popular today in universities, especially in life sciences, as well as for some Web 2.0 applications. Part of EMC's pitch is that the NameNode is a single point of failure and there is no high-availability for it, effectively, it claims, ruling it out for enterprise data centres. The company reckons that there is a large opportunity to provide Hadoop systems for big data analytics in corporate data centres if HDFS could be made usable in the enterprise-sense and manageable by ordinary storage admins. That's what it's doing by providing an integrated Isilon-HDFS storage back-end for a Greenplum HD analytics front-end.

With the Isilon OneFS v6.5 release, EMC has provided a one-stop Apache Hadoop shop and what it sees as missing facilities in the Hadoop world, namely:

  • A sharable instead of a dedicated storage infrastructure;
  • high availability for the NameNode;
  • protection through snapshots (SnapshotIQ), replication (SyncIQ) and backup (NDMP, backup);
  • improved storage efficiency beyond the 3X data mirroring of basic HDFS to the 80 per cent level;
  • ability to scale compute and capacity separately; and
  • automated data import/export via NDS, CIFS, FTP, and HTTP

Nick Kirsch, Isilon's director of product management, said of the NameNode implementation: "This is unique. The NameNode is now part of our distributed metadata. Every node is a NameNode."

Next Greenplum has certified Apache Hadoop, provided platform management and control, and parallel analytics access with the Greenplum database. EMC is also providing design and training services, 24x7 support around the world and a roadmap for development.

EMC contrasts its approach with that of Oracle and NetApp, neither of whom, EMC claims, can provide Hadoop natively integrated with their storage arrays; full HA for the NameNode; the same level of storage efficiency; multi-protocol access; and corporate-level protection features.

Purdue University has tried out the Isilon/Hadoop combo in its statistics department and has endorsed it, saying that there is now no need for a separate Hadoop data silo and that its users now had "a single, shared storage resource for data computing and analytics". Its statisticians do more statistics and less Hadoop infrastructure management.

EMC claims these added features will make Hadoop more usable by enterprises and also that enterprise Hadoop users will increasing look to data scientists (See Wikibon description) to statistically analyse their big data sets for meaningful – and monetisable – information. After all, the ability to monetise the crunched data is the big data pay-off.

EMC Greenplum HD on Isilon is available immediately through EMC and its channel partners. ®

Beginner's guide to SSL certificates

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
'Urika': Cray unveils new 1,500-core big data crunching monster
6TB of DRAM, 38TB of SSD flash and 120TB of disk storage
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
SDI wars: WTF is software defined infrastructure?
This time we play for ALL the marbles
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Oracle hires former SAP exec for cloudy push
'We know Larry said cloud was gibberish, and insane, and idiotic, but...'
Symantec backs out of Backup Exec: Plans to can appliance in Jan
Will still provide support to existing customers
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
Win a year’s supply of chocolate
There is no techie angle to this competition so we're not going to pretend there is, but everyone loves chocolate so who cares.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.