Feeds

MapR cranks out updated Hadoop data muncher

Lays foundation for MapReduce 2.0

HP ProLiant Gen8: Integrated lifecycle automation

There are a slew of companies that want to be the Red Hat for open source Hadoop data chewing, making money by beefing it up and selling support for the collection of programs. MapR Technologies, which came out of stealth mode in May, has some proprietary extensions to Hadoop, but all of the goodies being added with MapR Distribution Version 1.2 are available in its open source distribution.

There are three important proprietary extensions to Hadoop in the MapR distribution. One is high-availability clustering for the Hadoop NameNode, which is the heartbeat of a Hadoop cluster – akin to the head node in a parallel supercomputer cluster.

The second is a revamped storage layer for the Hadoop Distributed File System that allows it to be mounted like a Network File System (NFS) drive and for random reads and writes to be done on it, and the third is a parallelized version of Hadoop's JobTracker – the job scheduler on a Hadoop cluster – that can run across multiple physical nodes and not become a bottleneck.

With MapR Distribution Version 1.2, MapR's techies have come up with an alternate implementation of the libhdfs file-access library for HDFS that completely bypasses the Java virtual machine and lets C and C++ applications and other scripting languages get "native" access to HDFS. You don't have to recompile existing Hadoop applications, because this MapR libhdfs alternative has the same header files as the open source Apache Hadoop version of libhdfs.

V1.2 of MapR's Hadoop distro also includes upgrades to the HBase column-oriented distributed data store (modeled after Google's BigTable) that rides on top of HDFS to the 0.90.4 release level. MapR says that it found 15 fixes for stability and data corruption errors with HBase, and has back-ported fixes from future HBase releases to the .90.4 releases. (This is exactly the kind of thing that Red Hat did with the Linux kernel in the Linux 2.4 and early Linux 2.6 kernel generations.)

The MapR update also includes native management client support for Windows 7 and Mac OS X, so if you don't want to administer a Hadoop cluster from a Linux machine you don't have to find an emulator and load the Hadoop client into it to dispatch work to the JobTracker.

The MapR Hadoop cluster itself runs only on Linux. "Nobody has asked us for Windows support for the cluster," MapR VP of marketing Jack Norris tells El Reg.

Norris adds that MapR has already laid the groundwork to implement the new MapReduce 2.0 architecture, also known as the YARN project at Apache (Yet Another Resource Negotiator) that will break the two functions of the JobTracker – resource management and job scheduling and monitoring – into two pieces. The YARN effort will also allow for other algorithms besides MapReduce to run across the clusters and their data, and yet remain under control of Hadoop.

The final addition with v1.2 of MapR's Hadoop distribution is a virtual machine image of the entire distro, packaged up in a VMware ESXi container, that can be run on any ESXi-capable machine or even the freebie VMware Player so you can get a taste of Hadoop without having to set it up yourself. The intent is to make a single-node Hadoop setup that newbies can play with. And if you want to get a little crazy, you can install multiple VM images of the freebie MapR Hadoop distro and cluster those together.

The freebie edition of MapR's distro is called M3, and is a complete distribution packed with HDFS, HBase, Pig, Hive, Mahout, Cascading, Sqoop, Flum, and other common features. The MapR M5 Edition is the extended version that includes the parallel extensions to JobTracker and NameNode and the NFS mounting; it costs $4,000 per node. ®

Reducing security risks from open source software

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
Amazon Reveals One Weird Trick: A Loss On Almost $20bn In Sales
Investors really hate it: Share price plunge as growth SLOWS in key AWS division
US judge: YES, cops or feds so can slurp an ENTIRE Gmail account
Crooks don't have folders labelled 'drug records', opines NY beak
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
The triumph of VVOL: Everyone's jumping into bed with VMware
'Bandwagon'? Yes, we're on it and so what, say big dogs
Carbon tax repeal won't see data centre operators cut prices
Rackspace says electricity isn't a major cost, Equinix promises 'no levy'
prev story

Whitepapers

Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Seven Steps to Software Security
Seven practical steps you can begin to take today to secure your applications and prevent the damages a successful cyber-attack can cause.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.