Feeds

Doug Cutting: Hadoop dodged a Microsoft-Oracle stomping

Elephant daddy on breaking into mainstream IT

High performance access to file storage

RDBMS grows up

The roots of competition with RDBMS are already there: Hadoop Pig is a data analysis program and Hive a data warehousing system. Cutting says parts of data warehousing and ETL might well be subsumed as these grow.

You laugh? Well, it’s not like RDBMS owns data analysis, data warehousing or ETL; these have only became synonymous with RDBMS because the software giants – Microsoft, Oracle and IBM – have poured vast sums of money into developing the tools that added value to their software by allowing them to take more workloads off the mainframe and midrange systems of the day. “There might be some battles where some existing technologies get incorporated into the big data sack,” Cutting says.

Co-existence, though, is the watchword as Cutting reckons Hadoop can find a new niche working with RDBMS. “The sweet spot for the growth is not to try to displace things like that but rather to attack problems people are having when they use those things. Over time the technologies could creep upstream and erode neighbouring technologies but, right now, there’s enough new stuff to keep us busy.”

'It’s getting better,' Baldeschwieler said of Hortonworks’ relationship with Cloudera. 'There’s name-calling in all open-source projects.'

The RDBMS giants, it seems, agree with Cutting's view that Hadoop won't compete for enterprise transactional loads such as payroll and inventory.

The other big reason Microsoft and Oracle have lowered their defences is that Hadoop will actually reinforce their positions and the position of RDBMS. Hadoop will bring larger numbers of developers to their databases. These devs will build applications for big data and the web that – in part – will use information held in RDBMS. Hadoop is built using open-source Java while the Avro project allows compiling in Java, C, C++, C sharp, Python and Ruby.

The big companies might be buying into Hadoop right now, but they could still pose problems and could help contribute to some kind of fragmentation down the road. That's because the RDBMS giants have taken sides: Microsoft has chosen to work with Hortonworks, formed in June 2011 with the engineering team who’d worked with Cutting on Hadoop but who’d remained at Yahoo!. IBM and Oracle, meanwhile, have gone with Cloudera, where Cutting is architect.

Cloudera and Hortonworks implement different modules of Hadoop in their distributions. You can compare the full list for Cloudera CDH here and Hortonworks' Data Platform here (warning PDF). Hortonworks has also added third-party software to its module mix: Talend's Open Studio.

There has already been tension between Cutting’s company and Hortonworks. They got in to a bloggy spat last year over who contributed which code to Apache (fixes versus new features).

At the time of the spat, Baldeschwieler told The Reg this was was business as usual for open source but he reckoned Hortonworks and Cloudera are united on the common cause of improving Hadoop. “It’s getting better,” Baldeschwieler said of Hortonworks’ relationship with Cloudera, adding: “There’s name-calling in all open-source projects...

“At this point,” he continued, “there’s an obvious consensus that Cloudera and Hortonworks are equally focused on making Hadoop better.”

Baldeschwieler called Hortonworks' partnership with Microsoft “an example of building a strong relationship to take Hadoop to more customers.”

Cutting hopes Hadoop can stay united using BigTop, which acts as a kind of reference model. BigTop is an Apache project that integrates core Hadoop with the Zookeeper, HBass, Hive, Pig, Mahout, Oozie, Sqoop, Flume and Whirr modules and with versions of Fedora, CentOS, Red Hat Enterprise Linux and SuSE Linux Ubuntu. The basis of BigTop is Cloudera’s CDH and the idea is that all future versions of CDH will come from BigTop.

Cutting reckons BigTop will align CDH to the official Apache Hadoop project. While Cloudera has tried to align CDH to Apache releases, gaps have sometimes emerged as bugfixes and some features are back-ported to CDH that correspond to earlier versions of the Apache release, Cutting said.

Cutting doesn’t see fragmentation as a big problem but did note that fewer versions of Hadoop would be a good thing – it would make things easier for developers and, presumably, better for firms like Cloudera, which hopes to build a Hadoop business based upon the distro which the market likes the most. “Fewer distributions would make it easier for developers, since they'd have fewer combinations of versions to support,” Cutting said.

Coding inside the BigTop

To mean anything, however, BigTop, will need everybody – not just Cloudera – to support its development and to swallow up the code base into their distros. Currently, BigTop is rather Cloudera-centric. According to this blog post, and based on the Hortonworks HDP data sheet (warning PDF), Hortonworks is only "parts” of BigTop.

“We're hoping that other distributors will join us in collaborating on BigTop to further reduce any such fragmentation issues,” Cutting told The Reg. “We currently have lots of folks collaborating well on these projects who don't share a distribution. Ideally we'll all start collaborating through BigTop and the various distributions will interoperate easily.”

BigTop comes as Cutting reckons further changes are needed to help Hadoop hit its potential – and establish that footing it aims for in mainstream IT. Changes won't be big, he says. Instead, they will be refinements.

Combat fraud and increase customer satisfaction

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Inside the Hekaton: SQL Server 2014's database engine deconstructed
Nadella's database sqares the circle of cheap memory vs speed
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Oh no, Joe: WinPhone users already griping over 8.1 mega-update
Hang on. Which bit of Developer Preview don't you understand?
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
Internet-of-stuff startup dumps NoSQL for ... SQL?
NoSQL taste great at first but lacks proper nutrients, says startup cloud whiz
Batten down the hatches, Ubuntu 14.04 LTS due in TWO DAYS
Admins dab straining server brows in advance of Trusty Tahr's long-term support landing
IRS boss on XP migration: 'Classic fix the airplane while you're flying it attempt'
Plus: Condoleezza Rice at Dropbox 'maybe she can find ... weapons of mass destruction'
Ditch the sync, paddle in the Streem: Upstart offers syncless sharing
Upload, delete and carry on sharing afterwards?
prev story

Whitepapers

Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.