Feeds

MapR cranks out updated Hadoop data muncher

Lays foundation for MapReduce 2.0

Combat fraud and increase customer satisfaction

There are a slew of companies that want to be the Red Hat for open source Hadoop data chewing, making money by beefing it up and selling support for the collection of programs. MapR Technologies, which came out of stealth mode in May, has some proprietary extensions to Hadoop, but all of the goodies being added with MapR Distribution Version 1.2 are available in its open source distribution.

There are three important proprietary extensions to Hadoop in the MapR distribution. One is high-availability clustering for the Hadoop NameNode, which is the heartbeat of a Hadoop cluster – akin to the head node in a parallel supercomputer cluster.

The second is a revamped storage layer for the Hadoop Distributed File System that allows it to be mounted like a Network File System (NFS) drive and for random reads and writes to be done on it, and the third is a parallelized version of Hadoop's JobTracker – the job scheduler on a Hadoop cluster – that can run across multiple physical nodes and not become a bottleneck.

With MapR Distribution Version 1.2, MapR's techies have come up with an alternate implementation of the libhdfs file-access library for HDFS that completely bypasses the Java virtual machine and lets C and C++ applications and other scripting languages get "native" access to HDFS. You don't have to recompile existing Hadoop applications, because this MapR libhdfs alternative has the same header files as the open source Apache Hadoop version of libhdfs.

V1.2 of MapR's Hadoop distro also includes upgrades to the HBase column-oriented distributed data store (modeled after Google's BigTable) that rides on top of HDFS to the 0.90.4 release level. MapR says that it found 15 fixes for stability and data corruption errors with HBase, and has back-ported fixes from future HBase releases to the .90.4 releases. (This is exactly the kind of thing that Red Hat did with the Linux kernel in the Linux 2.4 and early Linux 2.6 kernel generations.)

The MapR update also includes native management client support for Windows 7 and Mac OS X, so if you don't want to administer a Hadoop cluster from a Linux machine you don't have to find an emulator and load the Hadoop client into it to dispatch work to the JobTracker.

The MapR Hadoop cluster itself runs only on Linux. "Nobody has asked us for Windows support for the cluster," MapR VP of marketing Jack Norris tells El Reg.

Norris adds that MapR has already laid the groundwork to implement the new MapReduce 2.0 architecture, also known as the YARN project at Apache (Yet Another Resource Negotiator) that will break the two functions of the JobTracker – resource management and job scheduling and monitoring – into two pieces. The YARN effort will also allow for other algorithms besides MapReduce to run across the clusters and their data, and yet remain under control of Hadoop.

The final addition with v1.2 of MapR's Hadoop distribution is a virtual machine image of the entire distro, packaged up in a VMware ESXi container, that can be run on any ESXi-capable machine or even the freebie VMware Player so you can get a taste of Hadoop without having to set it up yourself. The intent is to make a single-node Hadoop setup that newbies can play with. And if you want to get a little crazy, you can install multiple VM images of the freebie MapR Hadoop distro and cluster those together.

The freebie edition of MapR's distro is called M3, and is a complete distribution packed with HDFS, HBase, Pig, Hive, Mahout, Cascading, Sqoop, Flum, and other common features. The MapR M5 Edition is the extended version that includes the parallel extensions to JobTracker and NameNode and the NFS mounting; it costs $4,000 per node. ®

Combat fraud and increase customer satisfaction

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Kingston DataTraveler MicroDuo: Turn your phone into a 72GB beast
USB-usiness in the front, micro-USB party in the back
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
BOFH: Oh DO tell us what you think. *CLICK*
$%%&amp Oh dear, we've been cut *CLICK* Well hello *CLICK* You're breaking up...
AMD's 'Seattle' 64-bit ARM server chips now sampling, set to launch in late 2014
But they won't appear in SeaMicro Fabric Compute Systems anytime soon
Amazon reveals its Google-killing 'R3' server instances
A mega-memory instance that never forgets
Cisco reps flog Whiptail's Invicta arrays against EMC and Pure
Storage reseller report reveals who's selling what
prev story

Whitepapers

Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.