Hadoop distie MapR trousers another $30m to take on big data rivals

Working towards that eventual IPO, if it isn't eaten first

Gartner critical capabilities for enterprise endpoint backup

MapR Technologies, one of the commercializers of the Hadoop big data muncher, has pocketed another $30m to help it ramp up its business and keep it on track for what the company hopes will be an initial public offering..

While Cloudera was out of the gate early commercializing the Hadoop big data muncher, MapR was close behind (by a matter of weeks) and no Hadoop distie has yet emerged as the inevitable Red Hat for fat and fast data.

There are plenty of other contenders, all of them doing interesting things to and with Hadoop, including (in no certain order) MapR, the Hortonworks direct spinout from Yahoo!, the spinning-out Pivotal unit of EMC, IBM (which has sold its own BigInsights variant of Hadoop for a few years) and now Intel, which has just announced its own Hadoop distro.

What is amazing is that Yahoo! spun Hortonworks out in the first place instead of leveraging it as a strategic asset, and that software-hungry Hewlett-Packard and Dell have not snapped up Cloudera or MapR to build out their software portfolios.

Every day that passes, these companies get more and more expensive, to the point where both must be tempted to either give up on owning their own distributions or grab the various Apache components and start up one of their own.

With the big data market (which means subscription support for open source components plus licensing for proprietary software extensions and the hardware to run it) expected to reach $5bn in revenues by 2016 or so, there would seem to be plenty of room for multiple contenders. Markets have tended in the past to create a few dominant players, and while MapR wants to be one of them in the big data world.

But with the advent of cloud platform services like Amazon Web Services' Elastic MapReduce, Google's BigQuery, or the eponymous service from Splunk, many companies may simply never install their own big data software. And still others with the technical resources may decide that Hadoop is strategic enough of an infrastructure/application layer that they build their own competence.

And so it is not a foregone conclusion at this point in the big data game that Hadoop will precisely track the history of the Linux operating system or that a dominant player like Red Hat will emerge. The market could remain highly fragmented.

None of the Hadoop disties want to think about that possibility, and they certainly want to be able to leverage what must be some pretty high multiples to either go public or sell out to the tier one IT system suppliers who are desperate to build up their software and services businesses.

"We've got a management team that is not looking for a quick exit," Jack Norris, vice president of TKTK, tells El Reg. "This is a paradigm shift, this is a new architecture. We are focused on an IPO, and John has the Splunk IPO on his desk and he looks at it often. We think we have an even bigger opportunity." Norris was referring to John Schroeder, [co-founder and CEO of MapR.

MapR's equity backers think it has a bigger opportunity than Splunk, too. In the first two rounds of funding from Lightspeed Venture Partners, Redpoint Ventures, and NEA, MapR was able to raise $29m and get several generations of Hadoop distributions into the field. The company, being privately held, does not provide revenue figures or customer counts, but has grown to 150 employees. The company's second round helped MapR open offices in London and Munich as part of its expansion in Europe.

This time around with the $30m in Series C funding, Mayfield Fund is leading the investment (with all three other equity players kicking in more dough), and Norris says the plan is to use it to expand into Asia while at the same time boosting its research and development to extend the MapR Hadoop stack.

The current M7 Hadoop distro marries MapR's innovative file system, which makes the Hadoop Distributed File System (HDFS) look like NFS to applications, with the HBase data warehousing layer for HDFS to significantly speed up SQL-like queries on Hadoop clusters.

That HBase speedup debuted back in October 2012, and it basically pushes HDFS down into its distributed NFS file system, and shards both data chunks and portions of HBase tables and spreads them around the cluster for performance but presents then as unified data and tables for applications.

MapR is very keen on its Apache Drill add-on for Hadoop, which is trying to bring realtime, interactive querying akin to what we have had for relational databases for decades to the Hadoop stack. Just as HBase sort of clones Google's BigTable overlay for its Google File System, Drill mimmicks Google's Dremel query tool, which uses an SQL-alike language called DrQL. Both Drill and the Google BigQuery service support DrQL.

All of the Hadoop disties are, of course, chasing the same dream. Cloudera has its Project Impala layer for HDFS to replace the Hive SQL-alike query language for HBase, and EMC's Pivotal group spinoff announced last week has taken the SQL guts out of the Greenplum parallel database and woven it into HDFS to create Project Hawq, which speaks actual SQL to sort through data stored in HDFS.

MapR is still the only Hadoop distie that can make HDFS speak NFS, but all of the big players are working on something that tries to make HDFS speak SQL, the default query language for relational databases, in one degree or another.

The investment by Mayfield Fund is not a particularly good indicator if MapR will end up being sold or will actually make a debut on Wall Street. The venture capital firm, established in 1969, has invested in over 500 companies. Of these, more than 100 have been sold off in mergers or acquisitions and more than 100 have gone public. ®

Secure remote control for conventional and virtual desktops

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Microsoft: Azure isn't ready for biz-critical apps … yet
Microsoft will move its own IT to the cloud to avoid $200m server bill
Oracle reveals 32-core, 10 BEEELLION-transistor SPARC M7
New chip scales to 1024 cores, 8192 threads 64 TB RAM, at speeds over 3.6GHz
US regulators OK sale of IBM's x86 server biz to Lenovo
Now all that remains is for gov't offices to ban the boxes
Flash could be CHEAPER than SAS DISK? Come off it, NetApp
Stats analysis reckons we'll hit that point in just three years
Object storage bods Exablox: RAID is dead, baby. RAID is dead
Bring your own disks to its object appliances
Nimble's latest mutants GORGE themselves on unlucky forerunners
Crossing Sandy Bridges without stopping for breath
prev story


Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 10 endpoint backup mistakes
Avoid the ten endpoint backup mistakes to ensure that your critical corporate data is protected and end user productivity is improved.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.