Feeds

MapR cranks out updated Hadoop data muncher

Lays foundation for MapReduce 2.0

Internet Security Threat Report 2014

There are a slew of companies that want to be the Red Hat for open source Hadoop data chewing, making money by beefing it up and selling support for the collection of programs. MapR Technologies, which came out of stealth mode in May, has some proprietary extensions to Hadoop, but all of the goodies being added with MapR Distribution Version 1.2 are available in its open source distribution.

There are three important proprietary extensions to Hadoop in the MapR distribution. One is high-availability clustering for the Hadoop NameNode, which is the heartbeat of a Hadoop cluster – akin to the head node in a parallel supercomputer cluster.

The second is a revamped storage layer for the Hadoop Distributed File System that allows it to be mounted like a Network File System (NFS) drive and for random reads and writes to be done on it, and the third is a parallelized version of Hadoop's JobTracker – the job scheduler on a Hadoop cluster – that can run across multiple physical nodes and not become a bottleneck.

With MapR Distribution Version 1.2, MapR's techies have come up with an alternate implementation of the libhdfs file-access library for HDFS that completely bypasses the Java virtual machine and lets C and C++ applications and other scripting languages get "native" access to HDFS. You don't have to recompile existing Hadoop applications, because this MapR libhdfs alternative has the same header files as the open source Apache Hadoop version of libhdfs.

V1.2 of MapR's Hadoop distro also includes upgrades to the HBase column-oriented distributed data store (modeled after Google's BigTable) that rides on top of HDFS to the 0.90.4 release level. MapR says that it found 15 fixes for stability and data corruption errors with HBase, and has back-ported fixes from future HBase releases to the .90.4 releases. (This is exactly the kind of thing that Red Hat did with the Linux kernel in the Linux 2.4 and early Linux 2.6 kernel generations.)

The MapR update also includes native management client support for Windows 7 and Mac OS X, so if you don't want to administer a Hadoop cluster from a Linux machine you don't have to find an emulator and load the Hadoop client into it to dispatch work to the JobTracker.

The MapR Hadoop cluster itself runs only on Linux. "Nobody has asked us for Windows support for the cluster," MapR VP of marketing Jack Norris tells El Reg.

Norris adds that MapR has already laid the groundwork to implement the new MapReduce 2.0 architecture, also known as the YARN project at Apache (Yet Another Resource Negotiator) that will break the two functions of the JobTracker – resource management and job scheduling and monitoring – into two pieces. The YARN effort will also allow for other algorithms besides MapReduce to run across the clusters and their data, and yet remain under control of Hadoop.

The final addition with v1.2 of MapR's Hadoop distribution is a virtual machine image of the entire distro, packaged up in a VMware ESXi container, that can be run on any ESXi-capable machine or even the freebie VMware Player so you can get a taste of Hadoop without having to set it up yourself. The intent is to make a single-node Hadoop setup that newbies can play with. And if you want to get a little crazy, you can install multiple VM images of the freebie MapR Hadoop distro and cluster those together.

The freebie edition of MapR's distro is called M3, and is a complete distribution packed with HDFS, HBase, Pig, Hive, Mahout, Cascading, Sqoop, Flum, and other common features. The MapR M5 Edition is the extended version that includes the parallel extensions to JobTracker and NameNode and the NFS mounting; it costs $4,000 per node. ®

Beginner's guide to SSL certificates

More from The Register

next story
The cloud that goes puff: Seagate Central home NAS woes
4TB of home storage is great, until you wake up to a dead device
Azure TITSUP caused by INFINITE LOOP
Fat fingered geo-block kept Aussies in the dark
You think the CLOUD's insecure? It's BETTER than UK.GOV's DATA CENTRES
We don't even know where some of them ARE – Maude
Want to STUFF Facebook with blatant ADVERTISING? Fine! But you must PAY
Pony up or push off, Zuck tells social marketeers
Oi, Europe! Tell US feds to GTFO of our servers, say Microsoft and pals
By writing a really angry letter about how it's harming our cloud business, ta
BOFH: WHERE did this 'fax-enabled' printer UPGRADE come from?
Don't worry about that cable, it's part of the config
Intel offers ingenious piece of 10TB 3D NAND chippery
The race for next generation flash capacity now on
prev story

Whitepapers

Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Getting ahead of the compliance curve
Learn about new services that make it easy to discover and manage certificates across the enterprise and how to get ahead of the compliance curve.
Top 5 reasons to deploy VMware with Tegile
Data demand and the rise of virtualization is challenging IT teams to deliver storage performance, scalability and capacity that can keep up, while maximizing efficiency.