Bringing Hadoop in from the cold: MapR is throwing adoption barriers on the fire

V4.1 has three shortcuts to get clusters up and running faster

Icefield

Comment The pace of Hadoop development is relentless. Hortonworks recently has had its IPO. Distribution owners strive to get their version deployed faster. Alliances are forming fast. MapR is extending its own reach by adding features so the software can be adopted more easily.

MapR was founded in 2009 to produce an enterprise-grade Hadoop distribution, and has been aggressively funded and grown: it has raised US$174m from five rounds, plus US$30m to finance its debt during an US$80m series E-stage in 2014. It has some 700 customers.

Meanwhile, Hortonworks, founded in 2011, grew at an even more breakneck pace, and raised US$248m in five rounds. Since the IPO its business performance has not exactly been stellar; it suffered a net loss of US$90m for the final calendar quarter if 2014 on revenues that grew 55 per cent year on year to US$12.7m. Full-year revenues were US$46m, but the net loss was a large US$177.3m.

Before MapR can IPO, it has to grow its business well beyond its 700 or so customers, and show that investors won’t be disappointed.

To that end, the latest MapR Hadoop 4.1 distribution includes:

  • MapR-DB table replication: this provides multiple active replica clusters across the world thanks to realtime asynchronous replication.
  • Table replication delivers realtime disaster recovery to reduce the risk of data loss upon site-wide failure.
  • MapR POSIX client gives apps running on edge nodes NFS access, with compression, parallel access, authentication and encryption supported.
  • A C API for MapR-DB giving software engineers the ability to write realtime Hadoop applications.

MapR says its active-active, cross-data-center capability means operational data can be stored and processed close to users or devices, and replicated to a central analytics cluster for larger-scale analytics on enterprise-wide data.

Anil Gadre, MapR Technologies product management SVP, had a prepared remark: “The newest version of the MapR Distribution extends real-time analysis on big and fast data to geographically-dispersed locations, enabling businesses to gain deeper insights and act on operational data as it happens.” Sounds good.

Mixing Big Data with other workloads on a set of servers

Mesosphere has devised a Data Centre OS (DCOS) for managing data center and cloud resources at scale. The DCOS core is Mesos, a distributed systems kernel that abstracts CPU, memory, storage and other compute resources, allowing developers to treat the data center as a single pool of resource.

MapR has got together with Mesosphere to produce Myriad, a resource management framework that allows Apache YARN jobs to run alongside other applications and services in enterprise and cloud data centers.

Myriad is an open-source project built on the idea of consolidating big data with other workloads onto a pool of resources for greater server utilization and operational efficiency. It adds the ability to add YARN jobs to a DCOS-managed server cluster.

That means you don’t have to run big data workloads on dedicated clusters with YARN and other workloads on Mesos-managed servers. You can run them both, MapR says, on the same set of servers: “Web services, streaming applications (like Storm), build systems, continuous integration tools (like Jenkins), HPC jobs (like MPI), Docker containers, as well as custom scripts and applications.”

Florian Leibert, Mesosphere CEO and co-founder, said: “Myriad allows you to run … all of your big data workloads and distributed applications and systems on a single pool of resources. Big data developers get the best of YARN’s power for Hadoop-driven workloads, and Mesos’ ability to run any other kind of workload, including non-Hadoop applications like Web applications and other long-running services.”

Quick Starts

MapR says it has provided three shortcuts – Quick Starts – to get Hadoop implementations up and running faster:

  • Data Warehouse Optimisation and Analytics Solution: This gives customers the flexibility to use Hadoop with their data warehouse to reduces overall system cost by performing transformations on Hadoop and freeing up previously used storage and capacity. Customers can add more data types and sources for more granular and richer analytics across the combined Hadoop and data warehouse system.
  • Security Log Analytics Solution: This enables analysis of historical data as well as realtime analysis of large volumes of security data, which can help in early detection of advanced and unknown threats. This augments Security Information and Event Management (SIEM) systems by providing cost-effective storage and processing for deep analytics and by predicting anomalous behavior within the environment to identify unknown threats.
  • Recommendation Engine Solution: This helps businesses increase revenues and customer loyalty by delivering realtime offers for products and services that combine past transactions, customer behavior, and other customer attributes.

Big data Hadoopery is like panning for gold in streams of data with novel equipment and uncertain outcomes. It's amazing it's been adopted as fast as it has. MapR hopes that getting rid of limitations should encourage adoption to grow.

Version 4.1 of the MapR Distribution is available now. For more information visit here. The Quick Starts are available from MapR, and authorised partners worldwide, with pricing starting at $30,000. ®


Biting the hand that feeds IT © 1998–2017