The Register® — Biting the hand that feeds IT

Feeds

Hadoop distie MapR trousers another $30m to take on big data rivals

Working towards that eventual IPO, if it isn't eaten first

Email delivery: Hate phishing emails? You'll love DMARC

MapR Technologies, one of the commercializers of the Hadoop big data muncher, has pocketed another $30m to help it ramp up its business and keep it on track for what the company hopes will be an initial public offering..

While Cloudera was out of the gate early commercializing the Hadoop big data muncher, MapR was close behind (by a matter of weeks) and no Hadoop distie has yet emerged as the inevitable Red Hat for fat and fast data.

There are plenty of other contenders, all of them doing interesting things to and with Hadoop, including (in no certain order) MapR, the Hortonworks direct spinout from Yahoo!, the spinning-out Pivotal unit of EMC, IBM (which has sold its own BigInsights variant of Hadoop for a few years) and now Intel, which has just announced its own Hadoop distro.

What is amazing is that Yahoo! spun Hortonworks out in the first place instead of leveraging it as a strategic asset, and that software-hungry Hewlett-Packard and Dell have not snapped up Cloudera or MapR to build out their software portfolios.

Every day that passes, these companies get more and more expensive, to the point where both must be tempted to either give up on owning their own distributions or grab the various Apache components and start up one of their own.

With the big data market (which means subscription support for open source components plus licensing for proprietary software extensions and the hardware to run it) expected to reach $5bn in revenues by 2016 or so, there would seem to be plenty of room for multiple contenders. Markets have tended in the past to create a few dominant players, and while MapR wants to be one of them in the big data world.

But with the advent of cloud platform services like Amazon Web Services' Elastic MapReduce, Google's BigQuery, or the eponymous service from Splunk, many companies may simply never install their own big data software. And still others with the technical resources may decide that Hadoop is strategic enough of an infrastructure/application layer that they build their own competence.

And so it is not a foregone conclusion at this point in the big data game that Hadoop will precisely track the history of the Linux operating system or that a dominant player like Red Hat will emerge. The market could remain highly fragmented.

None of the Hadoop disties want to think about that possibility, and they certainly want to be able to leverage what must be some pretty high multiples to either go public or sell out to the tier one IT system suppliers who are desperate to build up their software and services businesses.

"We've got a management team that is not looking for a quick exit," Jack Norris, vice president of TKTK, tells El Reg. "This is a paradigm shift, this is a new architecture. We are focused on an IPO, and John has the Splunk IPO on his desk and he looks at it often. We think we have an even bigger opportunity." Norris was referring to John Schroeder, [co-founder and CEO of MapR.

MapR's equity backers think it has a bigger opportunity than Splunk, too. In the first two rounds of funding from Lightspeed Venture Partners, Redpoint Ventures, and NEA, MapR was able to raise $29m and get several generations of Hadoop distributions into the field. The company, being privately held, does not provide revenue figures or customer counts, but has grown to 150 employees. The company's second round helped MapR open offices in London and Munich as part of its expansion in Europe.

This time around with the $30m in Series C funding, Mayfield Fund is leading the investment (with all three other equity players kicking in more dough), and Norris says the plan is to use it to expand into Asia while at the same time boosting its research and development to extend the MapR Hadoop stack.

The current M7 Hadoop distro marries MapR's innovative file system, which makes the Hadoop Distributed File System (HDFS) look like NFS to applications, with the HBase data warehousing layer for HDFS to significantly speed up SQL-like queries on Hadoop clusters.

That HBase speedup debuted back in October 2012, and it basically pushes HDFS down into its distributed NFS file system, and shards both data chunks and portions of HBase tables and spreads them around the cluster for performance but presents then as unified data and tables for applications.

MapR is very keen on its Apache Drill add-on for Hadoop, which is trying to bring realtime, interactive querying akin to what we have had for relational databases for decades to the Hadoop stack. Just as HBase sort of clones Google's BigTable overlay for its Google File System, Drill mimmicks Google's Dremel query tool, which uses an SQL-alike language called DrQL. Both Drill and the Google BigQuery service support DrQL.

All of the Hadoop disties are, of course, chasing the same dream. Cloudera has its Project Impala layer for HDFS to replace the Hive SQL-alike query language for HBase, and EMC's Pivotal group spinoff announced last week has taken the SQL guts out of the Greenplum parallel database and woven it into HDFS to create Project Hawq, which speaks actual SQL to sort through data stored in HDFS.

MapR is still the only Hadoop distie that can make HDFS speak NFS, but all of the big players are working on something that tries to make HDFS speak SQL, the default query language for relational databases, in one degree or another.

The investment by Mayfield Fund is not a particularly good indicator if MapR will end up being sold or will actually make a debut on Wall Street. The venture capital firm, established in 1969, has invested in over 500 companies. Of these, more than 100 have been sold off in mergers or acquisitions and more than 100 have gone public. ®

5 ways to reduce advertising network latency

Whitepapers

Microsoft’s Cloud OS
System Center Virtual Machine manager and how this product allows the level of virtualization abstraction to move from individual physical computers and clusters to unifying the whole Data Centre as an abstraction layer.
5 ways to prepare your advertising infrastructure for disaster
Being prepared allows your brand to greatly improve your advertising infrastructure performance and reliability that, in the end, will boost confidence in your brand.
Supercharge your infrastructure
Fusion­‐io has developed a shared storage solution that provides new performance management capabilities required to maximize flash utilization.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Avere FXT with FlashMove and FlashMirror
This ESG Lab validation report documents hands-on testing of the Avere FXT Series Edge Filer with the AOS 3.0 operating environment.

More from The Register

next story
Multipath TCP: Siri's new toy isn't a game-changer
This experiment is an alpha and carriers could swat it like a bug
Barmy Army to get Wi-Fi to the seat for cricket's Ashes
Sydney Test Match will offer replays to the smartmobe
Dedupe-dedupe, dedupe-dedupe-dedupe: Flashy clients crowd around Permabit diamond
3 of the top six flash vendors are casing the OEM dedupe tech, claims analyst
Disk-pushers, get reel: Even GOOGLE relies on tape
Prepare to be beaten by your old, cheap rival
Dragons' Den star's biz Outsourcery sends yet more millions up in smoke
Telly moneybags went into the cloud and still nobody's making any profit
Hong Kong's data centres stay high and dry amid Typhoon Usagi
180 km/h winds kill 25 in China, but the data centres keep humming
Microsoft lures punters to hybrid storage cloud with free storage arrays
Spend on Azure, get StorSimple box at the low, low price of $0
prev story