Feeds

Doug Cutting: Hadoop dodged a Microsoft-Oracle stomping

Elephant daddy on breaking into mainstream IT

Providing a secure and efficient Helpdesk

RDBMS grows up

The roots of competition with RDBMS are already there: Hadoop Pig is a data analysis program and Hive a data warehousing system. Cutting says parts of data warehousing and ETL might well be subsumed as these grow.

You laugh? Well, it’s not like RDBMS owns data analysis, data warehousing or ETL; these have only became synonymous with RDBMS because the software giants – Microsoft, Oracle and IBM – have poured vast sums of money into developing the tools that added value to their software by allowing them to take more workloads off the mainframe and midrange systems of the day. “There might be some battles where some existing technologies get incorporated into the big data sack,” Cutting says.

Co-existence, though, is the watchword as Cutting reckons Hadoop can find a new niche working with RDBMS. “The sweet spot for the growth is not to try to displace things like that but rather to attack problems people are having when they use those things. Over time the technologies could creep upstream and erode neighbouring technologies but, right now, there’s enough new stuff to keep us busy.”

'It’s getting better,' Baldeschwieler said of Hortonworks’ relationship with Cloudera. 'There’s name-calling in all open-source projects.'

The RDBMS giants, it seems, agree with Cutting's view that Hadoop won't compete for enterprise transactional loads such as payroll and inventory.

The other big reason Microsoft and Oracle have lowered their defences is that Hadoop will actually reinforce their positions and the position of RDBMS. Hadoop will bring larger numbers of developers to their databases. These devs will build applications for big data and the web that – in part – will use information held in RDBMS. Hadoop is built using open-source Java while the Avro project allows compiling in Java, C, C++, C sharp, Python and Ruby.

The big companies might be buying into Hadoop right now, but they could still pose problems and could help contribute to some kind of fragmentation down the road. That's because the RDBMS giants have taken sides: Microsoft has chosen to work with Hortonworks, formed in June 2011 with the engineering team who’d worked with Cutting on Hadoop but who’d remained at Yahoo!. IBM and Oracle, meanwhile, have gone with Cloudera, where Cutting is architect.

Cloudera and Hortonworks implement different modules of Hadoop in their distributions. You can compare the full list for Cloudera CDH here and Hortonworks' Data Platform here (warning PDF). Hortonworks has also added third-party software to its module mix: Talend's Open Studio.

There has already been tension between Cutting’s company and Hortonworks. They got in to a bloggy spat last year over who contributed which code to Apache (fixes versus new features).

At the time of the spat, Baldeschwieler told The Reg this was was business as usual for open source but he reckoned Hortonworks and Cloudera are united on the common cause of improving Hadoop. “It’s getting better,” Baldeschwieler said of Hortonworks’ relationship with Cloudera, adding: “There’s name-calling in all open-source projects...

“At this point,” he continued, “there’s an obvious consensus that Cloudera and Hortonworks are equally focused on making Hadoop better.”

Baldeschwieler called Hortonworks' partnership with Microsoft “an example of building a strong relationship to take Hadoop to more customers.”

Cutting hopes Hadoop can stay united using BigTop, which acts as a kind of reference model. BigTop is an Apache project that integrates core Hadoop with the Zookeeper, HBass, Hive, Pig, Mahout, Oozie, Sqoop, Flume and Whirr modules and with versions of Fedora, CentOS, Red Hat Enterprise Linux and SuSE Linux Ubuntu. The basis of BigTop is Cloudera’s CDH and the idea is that all future versions of CDH will come from BigTop.

Cutting reckons BigTop will align CDH to the official Apache Hadoop project. While Cloudera has tried to align CDH to Apache releases, gaps have sometimes emerged as bugfixes and some features are back-ported to CDH that correspond to earlier versions of the Apache release, Cutting said.

Cutting doesn’t see fragmentation as a big problem but did note that fewer versions of Hadoop would be a good thing – it would make things easier for developers and, presumably, better for firms like Cloudera, which hopes to build a Hadoop business based upon the distro which the market likes the most. “Fewer distributions would make it easier for developers, since they'd have fewer combinations of versions to support,” Cutting said.

Coding inside the BigTop

To mean anything, however, BigTop, will need everybody – not just Cloudera – to support its development and to swallow up the code base into their distros. Currently, BigTop is rather Cloudera-centric. According to this blog post, and based on the Hortonworks HDP data sheet (warning PDF), Hortonworks is only "parts” of BigTop.

“We're hoping that other distributors will join us in collaborating on BigTop to further reduce any such fragmentation issues,” Cutting told The Reg. “We currently have lots of folks collaborating well on these projects who don't share a distribution. Ideally we'll all start collaborating through BigTop and the various distributions will interoperate easily.”

BigTop comes as Cutting reckons further changes are needed to help Hadoop hit its potential – and establish that footing it aims for in mainstream IT. Changes won't be big, he says. Instead, they will be refinements.

Secure remote control for conventional and virtual desktops

More from The Register

next story
Microsoft WINDOWS 10: Seven ATE Nine. Or Eight did really
Windows NEIN skipped, tech preview due out on Wednesday
Business is back, baby! Hasta la VISTA, Win 8... Oh, yeah, Windows 9
Forget touchscreen millennials, Microsoft goes for mouse crowd
Apple: SO sorry for the iOS 8.0.1 UPDATE BUNGLE HORROR
Apple kills 'upgrade'. Hey, Microsoft. You sure you want to be like these guys?
ARM gives Internet of Things a piece of its mind – the Cortex-M7
32-bit core packs some DSP for VIP IoT CPU LOL
Microsoft on the Threshold of a new name for Windows next week
Rebranded OS reportedly set to be flung open by Redmond
Lotus Notes inventor Ozzie invents app to talk to people on your phone
Imagine that. Startup floats with voice collab app for Win iPhone
'Google is NOT the gatekeeper to the web, as some claim'
Plus: 'Pretty sure iOS 8.0.2 will just turn the iPhone into a fax machine'
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
The next step in data security
With recent increased privacy concerns and computers becoming more powerful, the chance of hackers being able to crack smaller-sized RSA keys increases.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.