Feeds

Platform wants to out-map, out-reduce Hadoop

Teaching financial grids to dance like stuffed elephants

Secure remote control for conventional and virtual desktops

Chewing on big data using the MapReduce protocol, and the open source Hadoop stack that implements it, is all the rage these days. But there is more than one way to stuff an elephant.

The Hadoop tool created by Yahoo! (and named after a stuffed elephant) is now managed by the Apache Software Foundation, and it is the tool of choice for running MapReduce algorithms against unstructured data. Platform Computing, the pioneer of grid computing that has been plying the HPC racket for two decades, says it has created a better way to run MapReduce algorithms against big data: Plunk it on Platform's Symphony financial grid software.

Platform has not ported Hadoop to the Symphony tool or somehow split open its code and shimmed chunks of Hadoop into Symphony, explains Ken Hertzler, vice president of product management at the company. Instead, Hertzler tells El Reg, Platform has grabbed the Hadoop MapReduce APIs, which are written in Java just like Hadoop and Symphony are, and embedded support for the MapReduce APIs into Symphony.

Ditto for the APIs for Pig, the programming language created for Hadoop and analogous to SQL for a relational database (but not SQL-like), and the APIs for Hive, which is a query language for Hadoop that actually offers commands similar to SQL for those who want to extract data out of their mapped and reduced unstructured data.

To support applications written for Hadoop, Platform is adding support for the Hadoop Distributed File System (HDFS) underneath Symphony, and is still allowing for IBM's General Parallel File System (GPFS) and Appistry's CloudIQ Storage clustered file system to plug into Symphony. The Platform MapReduce product is being rebranded as the Platform Workload Manager when it is tweaked to support MapReduce code. Here's what it looks like conceptually:

Platform Symphony MapReduce support

Platform runs MapReduce code on Symphony

Platform also wants to support commercial MapReduce projects and inferfaces, including IBM's Bigsheets and Python and C++ interfaces for the MapReduce APIs.

Symphony, if you are not acquainted with it, was created by Platform nine years ago because financial services firms that were trying to use its Load Sharing Facility (LSF) to run risk arbitrage applications were very unhappy with the sluggish performance and scale of that gridding software for running time-sensitive workloads. While LSF is good at managing the workflow of multiple HPC jobs on a supercomputing cluster, it was not designed to run one or a few jobs at low latency and high throughput. So Platform gutted LSF and created Symphony from scratch in the Java programming language. And over the years, it has ramped up the scalability of Symphony so it can span lots of cores.

There are a number of problems besides scalability that Platform is trying to address by support the Hadoop/Pig/Hive API stack on top of Symphony. The first is workload management for MapReduce applications.

"In the current Hadoop distro, it is one job at a time," Hertzler tells El Reg. "You need to add distributed cluster logic to manage multiple MapReduce jobs at the same time on the same cluster." Or, use multiple Hadoop clusters, as Yahoo! does. "But Symphony is already a distributed workload manager and knows how to distribute data and work around a cluster."

Platform is also pitching the fact that using Symphony to run MapReduce workloads gives customers a choice of file systems for their MapReduce workloads.

"We're not tied to any file system," says Hertzler. "We plan to open it up so customers can attach to any existing file system."

Secure remote control for conventional and virtual desktops

More from The Register

next story
Linux? Bah! Red Hat has its eye on the CLOUD – and it wants to own it
CEO says it will be 'undisputed leader' in enterprise cloud tech
Oracle SHELLSHOCKER - data titan lists unpatchables
Database kingpin lists 32 products that can't be patched (yet) as GNU fixes second vuln
Ello? ello? ello?: Facebook challenger in DDoS KNOCKOUT
Gets back up again after half an hour though
Hey, what's a STORAGE company doing working on Internet-of-Cars?
Boo - it's not a terabyte car, it's just predictive maintenance and that
Troll hunter Rackspace turns Rotatable's bizarro patent to stone
News of the Weird: Screen-rotating technology declared unpatentable
prev story

Whitepapers

A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Beginner's guide to SSL certificates
De-mystify the technology involved and give you the information you need to make the best decision when considering your online security options.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.