Feeds

Doug Cutting: Hadoop dodged a Microsoft-Oracle stomping

Elephant daddy on breaking into mainstream IT

Providing a secure and efficient Helpdesk

RDBMS grows up

The roots of competition with RDBMS are already there: Hadoop Pig is a data analysis program and Hive a data warehousing system. Cutting says parts of data warehousing and ETL might well be subsumed as these grow.

You laugh? Well, it’s not like RDBMS owns data analysis, data warehousing or ETL; these have only became synonymous with RDBMS because the software giants – Microsoft, Oracle and IBM – have poured vast sums of money into developing the tools that added value to their software by allowing them to take more workloads off the mainframe and midrange systems of the day. “There might be some battles where some existing technologies get incorporated into the big data sack,” Cutting says.

Co-existence, though, is the watchword as Cutting reckons Hadoop can find a new niche working with RDBMS. “The sweet spot for the growth is not to try to displace things like that but rather to attack problems people are having when they use those things. Over time the technologies could creep upstream and erode neighbouring technologies but, right now, there’s enough new stuff to keep us busy.”

'It’s getting better,' Baldeschwieler said of Hortonworks’ relationship with Cloudera. 'There’s name-calling in all open-source projects.'

The RDBMS giants, it seems, agree with Cutting's view that Hadoop won't compete for enterprise transactional loads such as payroll and inventory.

The other big reason Microsoft and Oracle have lowered their defences is that Hadoop will actually reinforce their positions and the position of RDBMS. Hadoop will bring larger numbers of developers to their databases. These devs will build applications for big data and the web that – in part – will use information held in RDBMS. Hadoop is built using open-source Java while the Avro project allows compiling in Java, C, C++, C sharp, Python and Ruby.

The big companies might be buying into Hadoop right now, but they could still pose problems and could help contribute to some kind of fragmentation down the road. That's because the RDBMS giants have taken sides: Microsoft has chosen to work with Hortonworks, formed in June 2011 with the engineering team who’d worked with Cutting on Hadoop but who’d remained at Yahoo!. IBM and Oracle, meanwhile, have gone with Cloudera, where Cutting is architect.

Cloudera and Hortonworks implement different modules of Hadoop in their distributions. You can compare the full list for Cloudera CDH here and Hortonworks' Data Platform here (warning PDF). Hortonworks has also added third-party software to its module mix: Talend's Open Studio.

There has already been tension between Cutting’s company and Hortonworks. They got in to a bloggy spat last year over who contributed which code to Apache (fixes versus new features).

At the time of the spat, Baldeschwieler told The Reg this was was business as usual for open source but he reckoned Hortonworks and Cloudera are united on the common cause of improving Hadoop. “It’s getting better,” Baldeschwieler said of Hortonworks’ relationship with Cloudera, adding: “There’s name-calling in all open-source projects...

“At this point,” he continued, “there’s an obvious consensus that Cloudera and Hortonworks are equally focused on making Hadoop better.”

Baldeschwieler called Hortonworks' partnership with Microsoft “an example of building a strong relationship to take Hadoop to more customers.”

Cutting hopes Hadoop can stay united using BigTop, which acts as a kind of reference model. BigTop is an Apache project that integrates core Hadoop with the Zookeeper, HBass, Hive, Pig, Mahout, Oozie, Sqoop, Flume and Whirr modules and with versions of Fedora, CentOS, Red Hat Enterprise Linux and SuSE Linux Ubuntu. The basis of BigTop is Cloudera’s CDH and the idea is that all future versions of CDH will come from BigTop.

Cutting reckons BigTop will align CDH to the official Apache Hadoop project. While Cloudera has tried to align CDH to Apache releases, gaps have sometimes emerged as bugfixes and some features are back-ported to CDH that correspond to earlier versions of the Apache release, Cutting said.

Cutting doesn’t see fragmentation as a big problem but did note that fewer versions of Hadoop would be a good thing – it would make things easier for developers and, presumably, better for firms like Cloudera, which hopes to build a Hadoop business based upon the distro which the market likes the most. “Fewer distributions would make it easier for developers, since they'd have fewer combinations of versions to support,” Cutting said.

Coding inside the BigTop

To mean anything, however, BigTop, will need everybody – not just Cloudera – to support its development and to swallow up the code base into their distros. Currently, BigTop is rather Cloudera-centric. According to this blog post, and based on the Hortonworks HDP data sheet (warning PDF), Hortonworks is only "parts” of BigTop.

“We're hoping that other distributors will join us in collaborating on BigTop to further reduce any such fragmentation issues,” Cutting told The Reg. “We currently have lots of folks collaborating well on these projects who don't share a distribution. Ideally we'll all start collaborating through BigTop and the various distributions will interoperate easily.”

BigTop comes as Cutting reckons further changes are needed to help Hadoop hit its potential – and establish that footing it aims for in mainstream IT. Changes won't be big, he says. Instead, they will be refinements.

Beginner's guide to SSL certificates

More from The Register

next story
ONE MILLION people already running Windows 10
A third of them are doing it in VMs, but early feedback focuses on frippery
Sign off my IT project or I’ll PHONE your MUM
Honestly, it’s a piece of piss
Netscape Navigator - the browser that started it all - turns 20
It was 20 years ago today, Marc Andreeesen taught the band to play
Torvalds CONFESSES: 'I'm pretty good at alienating devs'
Admits to 'a metric ****load' of mistakes during work with Linux collaborators
Sway: Microsoft's new Office app doesn't have an Undo function
Content aggregation, meet the workplace ... oh
Do Moan! MONSTER 6-day EMAIL OUTAGE hits Domain Monster
Customers freaked out by frightful service
Ploppr: The #VultureTRENDING App of the Now
This organic crowd sourced viro- social fertiliser just got REAL
Return of the Jedi – Apache reclaims web server crown
.london, .hamburg and .公司 - that's .com in Chinese - storm the web server charts
NetWare sales revive in China thanks to that man Snowden
If it ain't Microsoft, it's in fashion behind the Great Firewall
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Win a year’s supply of chocolate
There is no techie angle to this competition so we're not going to pretend there is, but everyone loves chocolate so who cares.
Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.