Feeds

Doug Cutting: Hadoop dodged a Microsoft-Oracle stomping

Elephant daddy on breaking into mainstream IT

Choosing a cloud hosting partner with confidence

Interview We’ve all heard plenty about open source changing the dynamics of the tech industry and upsetting the old order. Open source, we’re told, is manifest destiny. Companies that ignore it will be consigned to history and CIOs who assert there’s no freebie code behind their firewalls are out of touch with devs happily humming to Tomcat, Apache, Linux and PHP. At least that's how the story goes.

One open-source success of recent years has been Hadoop, the Apache-licensed implementation of Google’s MapReduce, which quickly and efficiently processes petabytes of data using clusters of ordinary x86 servers.

MapReduce works by splitting up massive data-processing jobs into chunks to be processed locally and parallelising the computation (the map phase), and then re-combining the results (the reduce phase) at the end. It means you don’t need big, centralised servers like mainframes or SPARC servers; it's a gift to x86 computing.

In the eight years since Hadoop was first written by Doug Cutting and Mike Cafarella, it has found a home running Amazon.com, Facebook and Yahoo! – some of the biggest sites on the web – among others. In the last eight months alone, Hadoop has won the backing of Microsoft, IBM and Oracle – three of the biggest names in relational databases. The software giants are now supporting a piece of software that the NoSQL zealots believed would actually kill RDBMS.

Microsoft is writing connectors between its databases, Windows and Azure cloud and Hadoop with Hadoop start-up Hortonworks, while Oracle is marrying Hadoop with its open-source MySQL database and merging the result with some Sun Microsystems' server hardware to produce yet another Oracle appliance.

'I’m pleased they decided not to fight it [Hadoop] with some proprietary solution, but to join forces with the open source one' – Doug Cutting

Yet, with history apparently on his side, you'd be surprised to learn Cutting feared Oracle and Microsoft might try to stand up to Hadoop, with disastrous consequences for the ecosystem. He said he is "gratified" that the pair decided to come on board.

“I’m pleased they decided not to fight it [Hadoop] with some proprietary solution, but to join forces with the open source one,” Cutting told The Reg. “It means those would otherwise be two potential sources of serious competition and to grow the community with two companies as big and powerful as Oracle and Microsoft is tremendous.

“I’m really gratified they have elected not to [develop proprietary solutions]. It’s a good thing for Hadoop for sure... I no longer see a formidable competitor, which is a little frightening," Cutting said.

“[It’s] frightening and exciting at the same time because it’s something you have to worry about, to win them over and convince them that this is a better approach. It’s gratifying when you haven’t got to do that.”

Microsoft and Oracle could have forked Hadoop’s code, building versions of Hadoop tailored to their systems thereby splitting the community into those who support Hadoop for large communities of Oracle and SQL Server users, and everybody else.

Not possible you say? Oracle has played politics before to get its way – with disastrous results for open-source projects. Oracle pulled the open-source Solaris project, OpenSolaris, back in-house in 2010 – allowing the fledgling open-source effort that had been blessed and spun up by Sun to die. Oracle’s control of OpenOffice has produced the LibreOffice fork in 2011, while Oracle's reluctance to let go of the Hudson build management system saw almost the entire community leave to create the rival Jenkins.

The legacy of such actions: forked codebases and rival claims over which is the one "true" project. Oracle has the brands, but the community has the code.

Then there’s Microsoft. Redmond is a strategic friend to open source, supporting projects where they help sell more copies of Windows or at prevent lost sales. So far it has worked on Linux, MySQL, PHP, and cuddled up to Eclipse on Silverlight.

On big data, Microsoft had been building a Hadoop-esque architecture since 2006. Called Dryad, it would “efficiently” process huge data loads running on Windows HPC Server 2008 R2 and HPC Pack 2008 R2-based clusters with Service Pack 2. In November last year, however, Microsoft quietly announced that it no longer planned to pursue Dryad as a commercial product just as it announced Hadoop connectors to SQL Server and Windows Azure.

Microsoft and Oracle have muscle in RDBMS. Oracle sells half the planet’s relational databases in a market worth $29bn with Microsoft in third place. The more open-source friendly IBM – which announced Hadoop connectors to its DB2 database around the same time as Oracle – is second.

It's not just Cutting, the father of Hadoop, who felt concern. Hortonworks, the start-up that spun out of Yahoo! last year with venture backing from Red Hat and JBoss investor Rob Bearden and competes with Cutting’s Cloudera, was also worried by what Microsoft and Oracle might do.

A thousand tiny elephants

Eric Baldeschwieler, Hortonworks’ chief technology officer, breathed what could be called a sigh of relief when Oracle last year announced its plans for a big data appliance using Hadoop. “It’s hugely validating of Hadoop, having all the major vendors coming in,” Baldeschwieler told The Reg at the time. “What we don’t want to see is thousands of flavours of Hadoop.”

Why did the giants suddenly turn friendly towards a technology that Cutting reckons will tread on the toes of their beloved RDBMS in about five years – when, as Cutting believes, it becomes an incumbent of mainstream enterprise IT?

“We are moving into a world where there’s lots of data,” Cutting says. “It [Hadoop] is not going to take over all software – there will be other technologies – but it’s going to become one of the mainstream staples in the next five to 10 years - maybe even sooner than that. It seems to be progressing pretty quickly.”

Top 5 reasons to deploy VMware with Tegile

Next page: RDBMS grows up

More from The Register

next story
Euro Parliament VOTES to BREAK UP GOOGLE. Er, OK then
It CANNA do it, captain.They DON'T have the POWER!
Download alert: Nearly ALL top 100 Android, iOS paid apps hacked
Attack of the Clones? Yeah, but much, much scarier – report
NSA SOURCE CODE LEAK: Information slurp tools to appear online
Now you can run your own intelligence agency
Post-Microsoft, post-PC programming: The portable REVOLUTION
Code jockeys: count up and grab your fabulous tablets
Twitter App Graph exposes smartphone spyware feature
You don't want everyone to compile app lists from your fondleware? BAD LUCK
Microsoft adds video offering to Office 365. Oh NOES, you'll need Adobe Flash
Lovely presentations... but not on your Flash-hating mobe
prev story

Whitepapers

10 ways wire data helps conquer IT complexity
IT teams can automatically detect problems across the IT environment, spot data theft, select unique pieces of transaction payloads to send to a data source, and more.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
How to determine if cloud backup is right for your servers
Two key factors, technical feasibility and TCO economics, that backup and IT operations managers should consider when assessing cloud backup.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security and trust: The backbone of doing business over the internet
Explores the current state of website security and the contributions Symantec is making to help organizations protect critical data and build trust with customers.