Feeds

Doug Cutting: Hadoop dodged a Microsoft-Oracle stomping

Elephant daddy on breaking into mainstream IT

Boost IT visibility and business value

RDBMS grows up

The roots of competition with RDBMS are already there: Hadoop Pig is a data analysis program and Hive a data warehousing system. Cutting says parts of data warehousing and ETL might well be subsumed as these grow.

You laugh? Well, it’s not like RDBMS owns data analysis, data warehousing or ETL; these have only became synonymous with RDBMS because the software giants – Microsoft, Oracle and IBM – have poured vast sums of money into developing the tools that added value to their software by allowing them to take more workloads off the mainframe and midrange systems of the day. “There might be some battles where some existing technologies get incorporated into the big data sack,” Cutting says.

Co-existence, though, is the watchword as Cutting reckons Hadoop can find a new niche working with RDBMS. “The sweet spot for the growth is not to try to displace things like that but rather to attack problems people are having when they use those things. Over time the technologies could creep upstream and erode neighbouring technologies but, right now, there’s enough new stuff to keep us busy.”

'It’s getting better,' Baldeschwieler said of Hortonworks’ relationship with Cloudera. 'There’s name-calling in all open-source projects.'

The RDBMS giants, it seems, agree with Cutting's view that Hadoop won't compete for enterprise transactional loads such as payroll and inventory.

The other big reason Microsoft and Oracle have lowered their defences is that Hadoop will actually reinforce their positions and the position of RDBMS. Hadoop will bring larger numbers of developers to their databases. These devs will build applications for big data and the web that – in part – will use information held in RDBMS. Hadoop is built using open-source Java while the Avro project allows compiling in Java, C, C++, C sharp, Python and Ruby.

The big companies might be buying into Hadoop right now, but they could still pose problems and could help contribute to some kind of fragmentation down the road. That's because the RDBMS giants have taken sides: Microsoft has chosen to work with Hortonworks, formed in June 2011 with the engineering team who’d worked with Cutting on Hadoop but who’d remained at Yahoo!. IBM and Oracle, meanwhile, have gone with Cloudera, where Cutting is architect.

Cloudera and Hortonworks implement different modules of Hadoop in their distributions. You can compare the full list for Cloudera CDH here and Hortonworks' Data Platform here (warning PDF). Hortonworks has also added third-party software to its module mix: Talend's Open Studio.

There has already been tension between Cutting’s company and Hortonworks. They got in to a bloggy spat last year over who contributed which code to Apache (fixes versus new features).

At the time of the spat, Baldeschwieler told The Reg this was was business as usual for open source but he reckoned Hortonworks and Cloudera are united on the common cause of improving Hadoop. “It’s getting better,” Baldeschwieler said of Hortonworks’ relationship with Cloudera, adding: “There’s name-calling in all open-source projects...

“At this point,” he continued, “there’s an obvious consensus that Cloudera and Hortonworks are equally focused on making Hadoop better.”

Baldeschwieler called Hortonworks' partnership with Microsoft “an example of building a strong relationship to take Hadoop to more customers.”

Cutting hopes Hadoop can stay united using BigTop, which acts as a kind of reference model. BigTop is an Apache project that integrates core Hadoop with the Zookeeper, HBass, Hive, Pig, Mahout, Oozie, Sqoop, Flume and Whirr modules and with versions of Fedora, CentOS, Red Hat Enterprise Linux and SuSE Linux Ubuntu. The basis of BigTop is Cloudera’s CDH and the idea is that all future versions of CDH will come from BigTop.

Cutting reckons BigTop will align CDH to the official Apache Hadoop project. While Cloudera has tried to align CDH to Apache releases, gaps have sometimes emerged as bugfixes and some features are back-ported to CDH that correspond to earlier versions of the Apache release, Cutting said.

Cutting doesn’t see fragmentation as a big problem but did note that fewer versions of Hadoop would be a good thing – it would make things easier for developers and, presumably, better for firms like Cloudera, which hopes to build a Hadoop business based upon the distro which the market likes the most. “Fewer distributions would make it easier for developers, since they'd have fewer combinations of versions to support,” Cutting said.

Coding inside the BigTop

To mean anything, however, BigTop, will need everybody – not just Cloudera – to support its development and to swallow up the code base into their distros. Currently, BigTop is rather Cloudera-centric. According to this blog post, and based on the Hortonworks HDP data sheet (warning PDF), Hortonworks is only "parts” of BigTop.

“We're hoping that other distributors will join us in collaborating on BigTop to further reduce any such fragmentation issues,” Cutting told The Reg. “We currently have lots of folks collaborating well on these projects who don't share a distribution. Ideally we'll all start collaborating through BigTop and the various distributions will interoperate easily.”

BigTop comes as Cutting reckons further changes are needed to help Hadoop hit its potential – and establish that footing it aims for in mainstream IT. Changes won't be big, he says. Instead, they will be refinements.

Build a business case: developing custom apps

More from The Register

next story
KDE releases ice-cream coloured Plasma 5 just in time for summer
Melty but refreshing - popular rival to Mint's Cinnamon's still a work in progress
Leaked Windows Phone 8.1 Update specs tease details of Nokia's next mobes
New screen sizes, dual SIMs, voice over LTE, and more
PEAK LANDFILL: Why tablet gloom is good news for Windows users
Sinofsky's hybrid strategy looks dafter than ever
Fiendishly complex password app extension ships for iOS 8
Just slip it in, won't hurt a bit, 1Password makers urge devs
Mozilla keeps its Beard, hopes anti-gay marriage troubles are now over
Plenty on new CEO's todo list – starting with Firefox's slipping grasp
Apple: We'll unleash OS X Yosemite beta on the MASSES on 24 July
Starting today, regular fanbois will be guinea pigs, it tells Reg
Another day, another Firefox: Version 31 is upon us ALREADY
Web devs, Mozilla really wants you to like this one
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
The Essential Guide to IT Transformation
ServiceNow discusses three IT transformations that can help CIO's automate IT services to transform IT and the enterprise.
Maximize storage efficiency across the enterprise
The HP StoreOnce backup solution offers highly flexible, centrally managed, and highly efficient data protection for any enterprise.