Feeds

MapR cranks out updated Hadoop data muncher

Lays foundation for MapReduce 2.0

Top three mobile application threats

There are a slew of companies that want to be the Red Hat for open source Hadoop data chewing, making money by beefing it up and selling support for the collection of programs. MapR Technologies, which came out of stealth mode in May, has some proprietary extensions to Hadoop, but all of the goodies being added with MapR Distribution Version 1.2 are available in its open source distribution.

There are three important proprietary extensions to Hadoop in the MapR distribution. One is high-availability clustering for the Hadoop NameNode, which is the heartbeat of a Hadoop cluster – akin to the head node in a parallel supercomputer cluster.

The second is a revamped storage layer for the Hadoop Distributed File System that allows it to be mounted like a Network File System (NFS) drive and for random reads and writes to be done on it, and the third is a parallelized version of Hadoop's JobTracker – the job scheduler on a Hadoop cluster – that can run across multiple physical nodes and not become a bottleneck.

With MapR Distribution Version 1.2, MapR's techies have come up with an alternate implementation of the libhdfs file-access library for HDFS that completely bypasses the Java virtual machine and lets C and C++ applications and other scripting languages get "native" access to HDFS. You don't have to recompile existing Hadoop applications, because this MapR libhdfs alternative has the same header files as the open source Apache Hadoop version of libhdfs.

V1.2 of MapR's Hadoop distro also includes upgrades to the HBase column-oriented distributed data store (modeled after Google's BigTable) that rides on top of HDFS to the 0.90.4 release level. MapR says that it found 15 fixes for stability and data corruption errors with HBase, and has back-ported fixes from future HBase releases to the .90.4 releases. (This is exactly the kind of thing that Red Hat did with the Linux kernel in the Linux 2.4 and early Linux 2.6 kernel generations.)

The MapR update also includes native management client support for Windows 7 and Mac OS X, so if you don't want to administer a Hadoop cluster from a Linux machine you don't have to find an emulator and load the Hadoop client into it to dispatch work to the JobTracker.

The MapR Hadoop cluster itself runs only on Linux. "Nobody has asked us for Windows support for the cluster," MapR VP of marketing Jack Norris tells El Reg.

Norris adds that MapR has already laid the groundwork to implement the new MapReduce 2.0 architecture, also known as the YARN project at Apache (Yet Another Resource Negotiator) that will break the two functions of the JobTracker – resource management and job scheduling and monitoring – into two pieces. The YARN effort will also allow for other algorithms besides MapReduce to run across the clusters and their data, and yet remain under control of Hadoop.

The final addition with v1.2 of MapR's Hadoop distribution is a virtual machine image of the entire distro, packaged up in a VMware ESXi container, that can be run on any ESXi-capable machine or even the freebie VMware Player so you can get a taste of Hadoop without having to set it up yourself. The intent is to make a single-node Hadoop setup that newbies can play with. And if you want to get a little crazy, you can install multiple VM images of the freebie MapR Hadoop distro and cluster those together.

The freebie edition of MapR's distro is called M3, and is a complete distribution packed with HDFS, HBase, Pig, Hive, Mahout, Cascading, Sqoop, Flum, and other common features. The MapR M5 Edition is the extended version that includes the parallel extensions to JobTracker and NameNode and the NFS mounting; it costs $4,000 per node. ®

High performance access to file storage

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Kingston DataTraveler MicroDuo: Turn your phone into a 72GB beast
USB-usiness in the front, micro-USB party in the back
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
Inside the Hekaton: SQL Server 2014's database engine deconstructed
Nadella's database sqares the circle of cheap memory vs speed
BOFH: Oh DO tell us what you think. *CLICK*
$%%&amp Oh dear, we've been cut *CLICK* Well hello *CLICK* You're breaking up...
Just what could be inside Dropbox's new 'Home For Life'?
Biz apps, messaging, photos, email, more storage – sorry, did you think there would be cake?
AMD's 'Seattle' 64-bit ARM server chips now sampling, set to launch in late 2014
But they won't appear in SeaMicro Fabric Compute Systems anytime soon
Amazon reveals its Google-killing 'R3' server instances
A mega-memory instance that never forgets
prev story

Whitepapers

Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.