Feeds

Beyond MapReduce: Hadoop hangs on

Tooling up

The Power of One Brief: Top reasons to choose HP BladeSystem

Open ... and Shut Hadoop is all the rage in enterprise computing, and has become the poster child for the big-data movement. But just as the enterprise consolidates around Hadoop, the web world, including Google – which originated the technology ideas behind Hadoop – is moving on to real-time, ad-hoc analytics that batch-oriented Hadoop can't match.

Is Hadoop already outdated?

As Cloudant chief scientist Mike Miller points out, Google's MapReduce approach to big data analytics may already be passé. It certainly is at Google:

[Google's MapReduce] no longer holds such prominence in the Google stack... Google seems to be moving past it. In fact, many of the technologies [Google now uses like Percolator for incremental indexing and analysis of frequently changing datasets and Dremel for ad-hoc analytics] aren’t even new; they date back the second half of the last decade, mere years after the seminal [MapReduce] paper was in print.

By one estimate, Hadoop, which is an open-source implementation of Google's MapReduce technology, hasn't even caught up to Google's original MapReduce framework. And now people like Miller are arguing that a MapReduce approach to Big Data is the wrong starting point altogether.

For a slow-moving enterprise, what to do?

The good news is that soon most enterprises likely won't have to bother with Hadoop at all, as Hadoop will be baked into the cloud applications that enterprises buy. And as those vendors figure out better technologies to handle real-time (like Storm) or ad hoc analysis (like Dremel), they, too, will be baked into cloud applications.

As an interim step to such applications, big-data tools vendors like Datameer and Karmasphere are already releasing cloud-based tools for analyzing Hadoop data. This is critical to Hadoop's short-term success as Forrester notes that Hadoop is still "an immature technology with many moving parts that are neither robust nor well integrated." Good tooling helps.

But is Hadoop the right place to start, good tooling or no?

Cloudscale chief executive Bill McColl, writing back in 2010, says "definitely not." He argues:

Simple batch processing tools like MapReduce and Hadoop are just not powerful enough in any one of the dimensions of the big data space that really matters. Sure, Hadoop is great for simple batch processing tasks that are “embarrassingly parallel”, but most of the difficult big data tasks confronting companies today are much more complex than that.

McColl isn't a neutral observer of Hadoop: his company competes with vanilla Hadoop deployments. My own company, Nodeable, offers a real-time complement to Hadoop, based on the open-source Storm project, but I'm much more sanguine about Hadoop's medium-term prospects than either McColl or Miller. But his point is well-taken, especially in light of Miller's observation that even the originator of MapReduce, Google, has largely moved on for faster, more responsive analytical tools.

Does it matter?

Probably not. At least, not anytime soon. It has long been the case that web giants like Facebook and Google have moved faster than enterprise IT, which tends to be much more risk-averse and more prone to hanging onto technology once it's made to work. So it's a Very Good Thing, as Businessweek highlights, that the web's technology of today is being open sourced to fuel the enterprise technology of tomorrow.

Hadoop still has several kinks to work out before it can go truly mainstream in the enterprise. It's not as if enterprises are going to go charging ahead into Percolator or other more modern approaches to big data when they have yet to squeeze Hadoop for maximum value. Enterprise IT managers like to travel in packs, and the pack is currently working on Hadoop. There may be better options out there, but they're going to need to find ways to complement Hadoop, not displace it. Hadoop simply has too much momentum going for it.

I suspect we'll see Hadoop continue forward as the primary engine of big data analytics. We're looking at many years of dominance for Hadoop. However, I think we'll also see add-on technologies offered by cloud vendors to augment the framework. Hadoop is never going to be a real-time system, so things like Storm will come to be viewed as must-have tools to provide real-time insight alongside Hadoop's timely, deep analytics.

Some early adopters will figure these tools out on their own without help from cloud application vendors. But for most, they're going to buy, not build, and that "buy" decision will include plenty of Hadoop, whether from Cloudera or Metamarkets or Hortonworks or EMC or anybody else. That's why Forrester pegs today's Hadoop ecosystem at $1bn, a number that is only going to grow, no matter what Google thinks is a better approach to big data. ®

Matt Asay is senior vice president of business development at Nodeable, offering systems management for managing and analysing cloud-based data. He was formerly SVP of biz dev at HTML5 start-up Strobe and chief operating officer of Ubuntu commercial operation Canonical. With more than a decade spent in open source, Asay served as Alfresco's general manager for the Americas and vice president of business development, and he helped put Novell on its open source track. Asay is an emeritus board member of the Open Source Initiative (OSI). His column, Open...and Shut, appears three times a week on The Register.

Securing Web Applications Made Simple and Scalable

More from The Register

next story
Apple fanbois SCREAM as update BRICKS their Macbook Airs
Ragegasm spills over as firmware upgrade kills machines
HIDDEN packet sniffer spy tech in MILLIONS of iPhones, iPads – expert
Don't panic though – Apple's backdoor is not wide open to all, guru tells us
NO MORE ALL CAPS and other pleasures of Visual Studio 14
Unpicking a packed preview that breaks down ASP.NET
Captain Kirk sets phaser to SLAUGHTER after trying new Facebook app
William Shatner less-than-impressed by Zuck's celebrity-only app
Microsoft takes on Chromebook with low-cost Windows laptops
Redmond's chief salesman: We're taking 'hard' decisions
Cheer up, Nokia fans. It can start making mobes again in 18 months
The real winner of the Nokia sale is *drumroll* ... Nokia
Mozilla fixes CRITICAL security holes in Firefox, urges v31 upgrade
Misc memory hazards 'could be exploited' - and guess what, one's a Javascript vuln
EU dons gloves, pokes Google's deals with Android mobe makers
El Reg cops a squint at investigatory letters
prev story

Whitepapers

Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Reducing security risks from open source software
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Consolidation: the foundation for IT and business transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.