MapR revs up HBase queries with M7 Hadoop distro

Solr search engine means elephants don't need to chew big data cud

Build a business case: developing custom apps

You are not just imagining it. Every commercial distributor of the Hadoop system for storing and chewing through unstructured data has come up with its own a different way to deliver something akin to SQL query functionality while at the same time boosting the speed of ad hoc queries.

MapR Technologies is one of the earlier Hadoop disties and cooked up a proprietary file system that can look like the familiar Network File System of Unix and Linux or the Hadoop Distributed File System. The company promised last fall to work its file system magic on the HBase database layer in Hadoop, and now that code is ready for prime-time.

MapR gave a preview of its HBase acceleration plans back in October when it previewed its top-end M7 Edition. MapR now has three different distributions.

M3 Edition is the freebie community version that allows users to mount HDFS just as if it was an NFS volume, among other extra goodies that are added to the Apache Hadoop distribution; it does not have commercial support, but only community support through the web.

With the M5 Edition, MapR offers tech support for a fee and turns on other features of its super-secret file system that allows for mirroring and snapshots of data sets, high availability clustering for the Hadoop JobTracker, a distributed NameNode (for both high availability and cluster scaling), and a heatmap to show what is going on inside of a Hadoop cluster. (Apache Hadoop has no such graphical tool, which makes a Hadoop cluster a bit of a black box when it is running.)

With the M7 Edition, which starts shipping today, MapR is speeding up the HBase database layer for HDFS by pushing it down into that proprietary and most certainly closed source file system.

Just like MapR has been able to speed up HDFS by breaking data into chunks and spreading it all around the MapR distributed file system, the M7 Edition takes chunks of HBase database tables and breaks them up and stores them in the MapR file system in close proximity to their raw files.

The beauty is that even with this big architectural change, MapR continues to support all of the same HDFS and HBase APIs, and thus applications running atop M7 Edition have no idea that they are not running on the plain vanilla Apache Hadoop.

Well, until they see the performance, says Jack Norris, vice president of marketing at MapR. The company has spent the past six months tweaking and tuning M7 to speed up HBase queries.

Sociocast, one of the early testers for the M7 Edition code, found that the MapR setup ran HBase queries about three times faster than the kosher HBase and was twice as fast as Cassandra, the NoSQL data store that Facebook developed as a replacement for HDFS because it was having such big performance issues with HBase.

Cassandra has what is called "eventual consistency" for its database, which is not a good fit for a lot of enterprise customers, who want absolute consistency. (We went into detail about how the M7 Edition works here.) MapR has done tests that show it can do over 1 million operations per second on M7 Edition on a ten node cluster, and the HBase clone is able to support up to 1 trillion tables.

It doesn't look like Apache Drill, the project that MapR created to clone Google's Dremel query tool, made it into the M7 Edition release. Dremel is used for Google's BigQuery service (which came out of beta last May beta), which does SQL queries against the BigTable database layer that rides atop Google's current file system. HBase is a clone of sorts of BigTable, and clearly you need to do Drill to clone Dremel if you want to be all Googley and such. The Hive SQL-alike query tool, which compiles queries down into MapReduce batch jobs, is better than nothing – but just barely.

In addition to rolling out the M7 Edition, MapR is also doing a private beta for an integrated search engine for Hadoop that it has developed in conjunction with LucidWorks, which was founded in 2007 and is the main company behind the Lucene and Solr search engines with about a quarter of its Apache project committers.

Just like HDFS data chunks and HBase table segments are being stored in that proprietary MapR file system, the search engine indexes for data stored in that file system are being spread around the file system and not housed outside of it. That means you can clone, snapshot, and do other natural acts of a file system on the search engine indices. You can also create indices for subsets of data rather than everything.

The Solr search engine capability will be woven into the M3, M5, and M7 editions. It is in private beta testing now, and Norris says there are no plans for a public beta at the moment. It is expected to be generally available in the next quarter and to be used by business managers and analysts that just want to poke around in the data to see what is in there.

MapR did not provide pricing for the M7 Edition, but says that it costs $4,000 per node per year for the M5 Edition and this you can expect to pay more for the one that gooses HBase speeds. ®

Boost IT visibility and business value

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
VVOL update: Are any vendors NOT leaping into bed with VMware?
It's not yet been released but everyone thinks it's the dog's danglies
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
prev story


Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
The Essential Guide to IT Transformation
ServiceNow discusses three IT transformations that can help CIO's automate IT services to transform IT and the enterprise.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.