Feeds

Beyond MapReduce: Hadoop hangs on

Tooling up

High performance access to file storage

Open ... and Shut Hadoop is all the rage in enterprise computing, and has become the poster child for the big-data movement. But just as the enterprise consolidates around Hadoop, the web world, including Google – which originated the technology ideas behind Hadoop – is moving on to real-time, ad-hoc analytics that batch-oriented Hadoop can't match.

Is Hadoop already outdated?

As Cloudant chief scientist Mike Miller points out, Google's MapReduce approach to big data analytics may already be passé. It certainly is at Google:

[Google's MapReduce] no longer holds such prominence in the Google stack... Google seems to be moving past it. In fact, many of the technologies [Google now uses like Percolator for incremental indexing and analysis of frequently changing datasets and Dremel for ad-hoc analytics] aren’t even new; they date back the second half of the last decade, mere years after the seminal [MapReduce] paper was in print.

By one estimate, Hadoop, which is an open-source implementation of Google's MapReduce technology, hasn't even caught up to Google's original MapReduce framework. And now people like Miller are arguing that a MapReduce approach to Big Data is the wrong starting point altogether.

For a slow-moving enterprise, what to do?

The good news is that soon most enterprises likely won't have to bother with Hadoop at all, as Hadoop will be baked into the cloud applications that enterprises buy. And as those vendors figure out better technologies to handle real-time (like Storm) or ad hoc analysis (like Dremel), they, too, will be baked into cloud applications.

As an interim step to such applications, big-data tools vendors like Datameer and Karmasphere are already releasing cloud-based tools for analyzing Hadoop data. This is critical to Hadoop's short-term success as Forrester notes that Hadoop is still "an immature technology with many moving parts that are neither robust nor well integrated." Good tooling helps.

But is Hadoop the right place to start, good tooling or no?

Cloudscale chief executive Bill McColl, writing back in 2010, says "definitely not." He argues:

Simple batch processing tools like MapReduce and Hadoop are just not powerful enough in any one of the dimensions of the big data space that really matters. Sure, Hadoop is great for simple batch processing tasks that are “embarrassingly parallel”, but most of the difficult big data tasks confronting companies today are much more complex than that.

McColl isn't a neutral observer of Hadoop: his company competes with vanilla Hadoop deployments. My own company, Nodeable, offers a real-time complement to Hadoop, based on the open-source Storm project, but I'm much more sanguine about Hadoop's medium-term prospects than either McColl or Miller. But his point is well-taken, especially in light of Miller's observation that even the originator of MapReduce, Google, has largely moved on for faster, more responsive analytical tools.

Does it matter?

Probably not. At least, not anytime soon. It has long been the case that web giants like Facebook and Google have moved faster than enterprise IT, which tends to be much more risk-averse and more prone to hanging onto technology once it's made to work. So it's a Very Good Thing, as Businessweek highlights, that the web's technology of today is being open sourced to fuel the enterprise technology of tomorrow.

Hadoop still has several kinks to work out before it can go truly mainstream in the enterprise. It's not as if enterprises are going to go charging ahead into Percolator or other more modern approaches to big data when they have yet to squeeze Hadoop for maximum value. Enterprise IT managers like to travel in packs, and the pack is currently working on Hadoop. There may be better options out there, but they're going to need to find ways to complement Hadoop, not displace it. Hadoop simply has too much momentum going for it.

I suspect we'll see Hadoop continue forward as the primary engine of big data analytics. We're looking at many years of dominance for Hadoop. However, I think we'll also see add-on technologies offered by cloud vendors to augment the framework. Hadoop is never going to be a real-time system, so things like Storm will come to be viewed as must-have tools to provide real-time insight alongside Hadoop's timely, deep analytics.

Some early adopters will figure these tools out on their own without help from cloud application vendors. But for most, they're going to buy, not build, and that "buy" decision will include plenty of Hadoop, whether from Cloudera or Metamarkets or Hortonworks or EMC or anybody else. That's why Forrester pegs today's Hadoop ecosystem at $1bn, a number that is only going to grow, no matter what Google thinks is a better approach to big data. ®

Matt Asay is senior vice president of business development at Nodeable, offering systems management for managing and analysing cloud-based data. He was formerly SVP of biz dev at HTML5 start-up Strobe and chief operating officer of Ubuntu commercial operation Canonical. With more than a decade spent in open source, Asay served as Alfresco's general manager for the Americas and vice president of business development, and he helped put Novell on its open source track. Asay is an emeritus board member of the Open Source Initiative (OSI). His column, Open...and Shut, appears three times a week on The Register.

High performance access to file storage

More from The Register

next story
Android engineer: We DIDN'T copy Apple OR follow Samsung's orders
Veep testifies for Samsung during Apple patent trial
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Windows 8.1, which you probably haven't upgraded to yet, ALREADY OBSOLETE
Pre-Update versions of new Windows version will no longer support patches
Batten down the hatches, Ubuntu 14.04 LTS due in TWO DAYS
Admins dab straining server brows in advance of Trusty Tahr's long-term support landing
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
Windows XP still has 27 per cent market share on its deathbed
Windows 7 making some gains on XP Death Day
Internet-of-stuff startup dumps NoSQL for ... SQL?
NoSQL taste great at first but lacks proper nutrients, says startup cloud whiz
Microsoft TIER SMEAR changes app prices whether devs ask or not
Some go up, some go down, Redmond goes silent
Red Hat to ship RHEL 7 release candidate with a taste of container tech
Grab 'near-final' version of next Enterprise Linux next week
prev story

Whitepapers

Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
HP ArcSight ESM solution helps Finansbank
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.