Feeds

Doug Cutting: Hadoop dodged a Microsoft-Oracle stomping

Elephant daddy on breaking into mainstream IT

New hybrid storage solutions

Interview We’ve all heard plenty about open source changing the dynamics of the tech industry and upsetting the old order. Open source, we’re told, is manifest destiny. Companies that ignore it will be consigned to history and CIOs who assert there’s no freebie code behind their firewalls are out of touch with devs happily humming to Tomcat, Apache, Linux and PHP. At least that's how the story goes.

One open-source success of recent years has been Hadoop, the Apache-licensed implementation of Google’s MapReduce, which quickly and efficiently processes petabytes of data using clusters of ordinary x86 servers.

MapReduce works by splitting up massive data-processing jobs into chunks to be processed locally and parallelising the computation (the map phase), and then re-combining the results (the reduce phase) at the end. It means you don’t need big, centralised servers like mainframes or SPARC servers; it's a gift to x86 computing.

In the eight years since Hadoop was first written by Doug Cutting and Mike Cafarella, it has found a home running Amazon.com, Facebook and Yahoo! – some of the biggest sites on the web – among others. In the last eight months alone, Hadoop has won the backing of Microsoft, IBM and Oracle – three of the biggest names in relational databases. The software giants are now supporting a piece of software that the NoSQL zealots believed would actually kill RDBMS.

Microsoft is writing connectors between its databases, Windows and Azure cloud and Hadoop with Hadoop start-up Hortonworks, while Oracle is marrying Hadoop with its open-source MySQL database and merging the result with some Sun Microsystems' server hardware to produce yet another Oracle appliance.

'I’m pleased they decided not to fight it [Hadoop] with some proprietary solution, but to join forces with the open source one' – Doug Cutting

Yet, with history apparently on his side, you'd be surprised to learn Cutting feared Oracle and Microsoft might try to stand up to Hadoop, with disastrous consequences for the ecosystem. He said he is "gratified" that the pair decided to come on board.

“I’m pleased they decided not to fight it [Hadoop] with some proprietary solution, but to join forces with the open source one,” Cutting told The Reg. “It means those would otherwise be two potential sources of serious competition and to grow the community with two companies as big and powerful as Oracle and Microsoft is tremendous.

“I’m really gratified they have elected not to [develop proprietary solutions]. It’s a good thing for Hadoop for sure... I no longer see a formidable competitor, which is a little frightening," Cutting said.

“[It’s] frightening and exciting at the same time because it’s something you have to worry about, to win them over and convince them that this is a better approach. It’s gratifying when you haven’t got to do that.”

Microsoft and Oracle could have forked Hadoop’s code, building versions of Hadoop tailored to their systems thereby splitting the community into those who support Hadoop for large communities of Oracle and SQL Server users, and everybody else.

Not possible you say? Oracle has played politics before to get its way – with disastrous results for open-source projects. Oracle pulled the open-source Solaris project, OpenSolaris, back in-house in 2010 – allowing the fledgling open-source effort that had been blessed and spun up by Sun to die. Oracle’s control of OpenOffice has produced the LibreOffice fork in 2011, while Oracle's reluctance to let go of the Hudson build management system saw almost the entire community leave to create the rival Jenkins.

The legacy of such actions: forked codebases and rival claims over which is the one "true" project. Oracle has the brands, but the community has the code.

Then there’s Microsoft. Redmond is a strategic friend to open source, supporting projects where they help sell more copies of Windows or at prevent lost sales. So far it has worked on Linux, MySQL, PHP, and cuddled up to Eclipse on Silverlight.

On big data, Microsoft had been building a Hadoop-esque architecture since 2006. Called Dryad, it would “efficiently” process huge data loads running on Windows HPC Server 2008 R2 and HPC Pack 2008 R2-based clusters with Service Pack 2. In November last year, however, Microsoft quietly announced that it no longer planned to pursue Dryad as a commercial product just as it announced Hadoop connectors to SQL Server and Windows Azure.

Microsoft and Oracle have muscle in RDBMS. Oracle sells half the planet’s relational databases in a market worth $29bn with Microsoft in third place. The more open-source friendly IBM – which announced Hadoop connectors to its DB2 database around the same time as Oracle – is second.

It's not just Cutting, the father of Hadoop, who felt concern. Hortonworks, the start-up that spun out of Yahoo! last year with venture backing from Red Hat and JBoss investor Rob Bearden and competes with Cutting’s Cloudera, was also worried by what Microsoft and Oracle might do.

A thousand tiny elephants

Eric Baldeschwieler, Hortonworks’ chief technology officer, breathed what could be called a sigh of relief when Oracle last year announced its plans for a big data appliance using Hadoop. “It’s hugely validating of Hadoop, having all the major vendors coming in,” Baldeschwieler told The Reg at the time. “What we don’t want to see is thousands of flavours of Hadoop.”

Why did the giants suddenly turn friendly towards a technology that Cutting reckons will tread on the toes of their beloved RDBMS in about five years – when, as Cutting believes, it becomes an incumbent of mainstream enterprise IT?

“We are moving into a world where there’s lots of data,” Cutting says. “It [Hadoop] is not going to take over all software – there will be other technologies – but it’s going to become one of the mainstream staples in the next five to 10 years - maybe even sooner than that. It seems to be progressing pretty quickly.”

Security for virtualized datacentres

Next page: RDBMS grows up

More from The Register

next story
Not appy with your Chromebook? Well now it can run Android apps
Google offers beta of tricky OS-inside-OS tech
Greater dev access to iOS 8 will put us AT RISK from HACKERS
Knocking holes in Apple's walled garden could backfire, says securo-chap
NHS grows a NoSQL backbone and rips out its Oracle Spine
Open source? In the government? Ha ha! What, wait ...?
Google extends app refund window to two hours
You now have 120 minutes to finish that game instead of 15
Intel: Hey, enterprises, drop everything and DO HADOOP
Big Data analytics projected to run on more servers than any other app
New 'Cosmos' browser surfs the net by TXT alone
No data plan? No WiFi? No worries ... except sluggish download speed
prev story

Whitepapers

Providing a secure and efficient Helpdesk
A single remote control platform for user support is be key to providing an efficient helpdesk. Retain full control over the way in which screen and keystroke data is transmitted.
Top 5 reasons to deploy VMware with Tegile
Data demand and the rise of virtualization is challenging IT teams to deliver storage performance, scalability and capacity that can keep up, while maximizing efficiency.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.