Feeds

Doug Cutting: Hadoop dodged a Microsoft-Oracle stomping

Elephant daddy on breaking into mainstream IT

Beginner's guide to SSL certificates

Interview We’ve all heard plenty about open source changing the dynamics of the tech industry and upsetting the old order. Open source, we’re told, is manifest destiny. Companies that ignore it will be consigned to history and CIOs who assert there’s no freebie code behind their firewalls are out of touch with devs happily humming to Tomcat, Apache, Linux and PHP. At least that's how the story goes.

One open-source success of recent years has been Hadoop, the Apache-licensed implementation of Google’s MapReduce, which quickly and efficiently processes petabytes of data using clusters of ordinary x86 servers.

MapReduce works by splitting up massive data-processing jobs into chunks to be processed locally and parallelising the computation (the map phase), and then re-combining the results (the reduce phase) at the end. It means you don’t need big, centralised servers like mainframes or SPARC servers; it's a gift to x86 computing.

In the eight years since Hadoop was first written by Doug Cutting and Mike Cafarella, it has found a home running Amazon.com, Facebook and Yahoo! – some of the biggest sites on the web – among others. In the last eight months alone, Hadoop has won the backing of Microsoft, IBM and Oracle – three of the biggest names in relational databases. The software giants are now supporting a piece of software that the NoSQL zealots believed would actually kill RDBMS.

Microsoft is writing connectors between its databases, Windows and Azure cloud and Hadoop with Hadoop start-up Hortonworks, while Oracle is marrying Hadoop with its open-source MySQL database and merging the result with some Sun Microsystems' server hardware to produce yet another Oracle appliance.

'I’m pleased they decided not to fight it [Hadoop] with some proprietary solution, but to join forces with the open source one' – Doug Cutting

Yet, with history apparently on his side, you'd be surprised to learn Cutting feared Oracle and Microsoft might try to stand up to Hadoop, with disastrous consequences for the ecosystem. He said he is "gratified" that the pair decided to come on board.

“I’m pleased they decided not to fight it [Hadoop] with some proprietary solution, but to join forces with the open source one,” Cutting told The Reg. “It means those would otherwise be two potential sources of serious competition and to grow the community with two companies as big and powerful as Oracle and Microsoft is tremendous.

“I’m really gratified they have elected not to [develop proprietary solutions]. It’s a good thing for Hadoop for sure... I no longer see a formidable competitor, which is a little frightening," Cutting said.

“[It’s] frightening and exciting at the same time because it’s something you have to worry about, to win them over and convince them that this is a better approach. It’s gratifying when you haven’t got to do that.”

Microsoft and Oracle could have forked Hadoop’s code, building versions of Hadoop tailored to their systems thereby splitting the community into those who support Hadoop for large communities of Oracle and SQL Server users, and everybody else.

Not possible you say? Oracle has played politics before to get its way – with disastrous results for open-source projects. Oracle pulled the open-source Solaris project, OpenSolaris, back in-house in 2010 – allowing the fledgling open-source effort that had been blessed and spun up by Sun to die. Oracle’s control of OpenOffice has produced the LibreOffice fork in 2011, while Oracle's reluctance to let go of the Hudson build management system saw almost the entire community leave to create the rival Jenkins.

The legacy of such actions: forked codebases and rival claims over which is the one "true" project. Oracle has the brands, but the community has the code.

Then there’s Microsoft. Redmond is a strategic friend to open source, supporting projects where they help sell more copies of Windows or at prevent lost sales. So far it has worked on Linux, MySQL, PHP, and cuddled up to Eclipse on Silverlight.

On big data, Microsoft had been building a Hadoop-esque architecture since 2006. Called Dryad, it would “efficiently” process huge data loads running on Windows HPC Server 2008 R2 and HPC Pack 2008 R2-based clusters with Service Pack 2. In November last year, however, Microsoft quietly announced that it no longer planned to pursue Dryad as a commercial product just as it announced Hadoop connectors to SQL Server and Windows Azure.

Microsoft and Oracle have muscle in RDBMS. Oracle sells half the planet’s relational databases in a market worth $29bn with Microsoft in third place. The more open-source friendly IBM – which announced Hadoop connectors to its DB2 database around the same time as Oracle – is second.

It's not just Cutting, the father of Hadoop, who felt concern. Hortonworks, the start-up that spun out of Yahoo! last year with venture backing from Red Hat and JBoss investor Rob Bearden and competes with Cutting’s Cloudera, was also worried by what Microsoft and Oracle might do.

A thousand tiny elephants

Eric Baldeschwieler, Hortonworks’ chief technology officer, breathed what could be called a sigh of relief when Oracle last year announced its plans for a big data appliance using Hadoop. “It’s hugely validating of Hadoop, having all the major vendors coming in,” Baldeschwieler told The Reg at the time. “What we don’t want to see is thousands of flavours of Hadoop.”

Why did the giants suddenly turn friendly towards a technology that Cutting reckons will tread on the toes of their beloved RDBMS in about five years – when, as Cutting believes, it becomes an incumbent of mainstream enterprise IT?

“We are moving into a world where there’s lots of data,” Cutting says. “It [Hadoop] is not going to take over all software – there will be other technologies – but it’s going to become one of the mainstream staples in the next five to 10 years - maybe even sooner than that. It seems to be progressing pretty quickly.”

Remote control for virtualized desktops

Next page: RDBMS grows up

More from The Register

next story
Nexus 7 fandroids tell of salty taste after sucking on Google's Lollipop
Web giant looking into why version 5.0 of Android is crippling older slabs
Be real, Apple: In-app goodie grab games AREN'T FREE – EU
Cupertino stands down after Euro legal threats
Download alert: Nearly ALL top 100 Android, iOS paid apps hacked
Attack of the Clones? Yeah, but much, much scarier – report
SLURP! Flick your TONGUE around our LOLLIPOP – Google
Android 5 is coming – IF you're lucky enough to have the right gadget
Microsoft: Your Linux Docker containers are now OURS to command
New tool lets admins wrangle Linux apps from Windows
Bada-Bing! Mozilla flips Firefox to YAHOO! for search
Microsoft system will be the default for browser in US until 2020
prev story

Whitepapers

Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Managing SSL certificates with ease
The lack of operational efficiencies and compliance pitfalls associated with poor SSL certificate management, and how the right SSL certificate management tool can help.
Top 5 reasons to deploy VMware with Tegile
Data demand and the rise of virtualization is challenging IT teams to deliver storage performance, scalability and capacity that can keep up, while maximizing efficiency.