Feeds

MapR cranks out updated Hadoop data muncher

Lays foundation for MapReduce 2.0

Choosing a cloud hosting partner with confidence

There are a slew of companies that want to be the Red Hat for open source Hadoop data chewing, making money by beefing it up and selling support for the collection of programs. MapR Technologies, which came out of stealth mode in May, has some proprietary extensions to Hadoop, but all of the goodies being added with MapR Distribution Version 1.2 are available in its open source distribution.

There are three important proprietary extensions to Hadoop in the MapR distribution. One is high-availability clustering for the Hadoop NameNode, which is the heartbeat of a Hadoop cluster – akin to the head node in a parallel supercomputer cluster.

The second is a revamped storage layer for the Hadoop Distributed File System that allows it to be mounted like a Network File System (NFS) drive and for random reads and writes to be done on it, and the third is a parallelized version of Hadoop's JobTracker – the job scheduler on a Hadoop cluster – that can run across multiple physical nodes and not become a bottleneck.

With MapR Distribution Version 1.2, MapR's techies have come up with an alternate implementation of the libhdfs file-access library for HDFS that completely bypasses the Java virtual machine and lets C and C++ applications and other scripting languages get "native" access to HDFS. You don't have to recompile existing Hadoop applications, because this MapR libhdfs alternative has the same header files as the open source Apache Hadoop version of libhdfs.

V1.2 of MapR's Hadoop distro also includes upgrades to the HBase column-oriented distributed data store (modeled after Google's BigTable) that rides on top of HDFS to the 0.90.4 release level. MapR says that it found 15 fixes for stability and data corruption errors with HBase, and has back-ported fixes from future HBase releases to the .90.4 releases. (This is exactly the kind of thing that Red Hat did with the Linux kernel in the Linux 2.4 and early Linux 2.6 kernel generations.)

The MapR update also includes native management client support for Windows 7 and Mac OS X, so if you don't want to administer a Hadoop cluster from a Linux machine you don't have to find an emulator and load the Hadoop client into it to dispatch work to the JobTracker.

The MapR Hadoop cluster itself runs only on Linux. "Nobody has asked us for Windows support for the cluster," MapR VP of marketing Jack Norris tells El Reg.

Norris adds that MapR has already laid the groundwork to implement the new MapReduce 2.0 architecture, also known as the YARN project at Apache (Yet Another Resource Negotiator) that will break the two functions of the JobTracker – resource management and job scheduling and monitoring – into two pieces. The YARN effort will also allow for other algorithms besides MapReduce to run across the clusters and their data, and yet remain under control of Hadoop.

The final addition with v1.2 of MapR's Hadoop distribution is a virtual machine image of the entire distro, packaged up in a VMware ESXi container, that can be run on any ESXi-capable machine or even the freebie VMware Player so you can get a taste of Hadoop without having to set it up yourself. The intent is to make a single-node Hadoop setup that newbies can play with. And if you want to get a little crazy, you can install multiple VM images of the freebie MapR Hadoop distro and cluster those together.

The freebie edition of MapR's distro is called M3, and is a complete distribution packed with HDFS, HBase, Pig, Hive, Mahout, Cascading, Sqoop, Flum, and other common features. The MapR M5 Edition is the extended version that includes the parallel extensions to JobTracker and NameNode and the NFS mounting; it costs $4,000 per node. ®

Security for virtualized datacentres

More from The Register

next story
It's Big, it's Blue... it's simply FABLESS! IBM's chip-free future
Or why the reversal of globalisation ain't gonna 'appen
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
Bitcasa bins $10-a-month Infinite storage offer
Firm cites 'low demand' plus 'abusers'
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
CAGE MATCH: Microsoft, Dell open co-located bit barns in Oz
Whole new species of XaaS spawning in the antipodes
Microsoft and Dell’s cloud in a box: Instant Azure for the data centre
A less painful way to run Microsoft’s private cloud
prev story

Whitepapers

Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
New hybrid storage solutions
Tackling data challenges through emerging hybrid storage solutions that enable optimum database performance whilst managing costs and increasingly large data stores.