Feeds

Platform wants to out-map, out-reduce Hadoop

Teaching financial grids to dance like stuffed elephants

Top three mobile application threats

Chewing on big data using the MapReduce protocol, and the open source Hadoop stack that implements it, is all the rage these days. But there is more than one way to stuff an elephant.

The Hadoop tool created by Yahoo! (and named after a stuffed elephant) is now managed by the Apache Software Foundation, and it is the tool of choice for running MapReduce algorithms against unstructured data. Platform Computing, the pioneer of grid computing that has been plying the HPC racket for two decades, says it has created a better way to run MapReduce algorithms against big data: Plunk it on Platform's Symphony financial grid software.

Platform has not ported Hadoop to the Symphony tool or somehow split open its code and shimmed chunks of Hadoop into Symphony, explains Ken Hertzler, vice president of product management at the company. Instead, Hertzler tells El Reg, Platform has grabbed the Hadoop MapReduce APIs, which are written in Java just like Hadoop and Symphony are, and embedded support for the MapReduce APIs into Symphony.

Ditto for the APIs for Pig, the programming language created for Hadoop and analogous to SQL for a relational database (but not SQL-like), and the APIs for Hive, which is a query language for Hadoop that actually offers commands similar to SQL for those who want to extract data out of their mapped and reduced unstructured data.

To support applications written for Hadoop, Platform is adding support for the Hadoop Distributed File System (HDFS) underneath Symphony, and is still allowing for IBM's General Parallel File System (GPFS) and Appistry's CloudIQ Storage clustered file system to plug into Symphony. The Platform MapReduce product is being rebranded as the Platform Workload Manager when it is tweaked to support MapReduce code. Here's what it looks like conceptually:

Platform Symphony MapReduce support

Platform runs MapReduce code on Symphony

Platform also wants to support commercial MapReduce projects and inferfaces, including IBM's Bigsheets and Python and C++ interfaces for the MapReduce APIs.

Symphony, if you are not acquainted with it, was created by Platform nine years ago because financial services firms that were trying to use its Load Sharing Facility (LSF) to run risk arbitrage applications were very unhappy with the sluggish performance and scale of that gridding software for running time-sensitive workloads. While LSF is good at managing the workflow of multiple HPC jobs on a supercomputing cluster, it was not designed to run one or a few jobs at low latency and high throughput. So Platform gutted LSF and created Symphony from scratch in the Java programming language. And over the years, it has ramped up the scalability of Symphony so it can span lots of cores.

There are a number of problems besides scalability that Platform is trying to address by support the Hadoop/Pig/Hive API stack on top of Symphony. The first is workload management for MapReduce applications.

"In the current Hadoop distro, it is one job at a time," Hertzler tells El Reg. "You need to add distributed cluster logic to manage multiple MapReduce jobs at the same time on the same cluster." Or, use multiple Hadoop clusters, as Yahoo! does. "But Symphony is already a distributed workload manager and knows how to distribute data and work around a cluster."

Platform is also pitching the fact that using Symphony to run MapReduce workloads gives customers a choice of file systems for their MapReduce workloads.

"We're not tied to any file system," says Hertzler. "We plan to open it up so customers can attach to any existing file system."

High performance access to file storage

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Kingston DataTraveler MicroDuo: Turn your phone into a 72GB beast
USB-usiness in the front, micro-USB party in the back
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
Inside the Hekaton: SQL Server 2014's database engine deconstructed
Nadella's database sqares the circle of cheap memory vs speed
BOFH: Oh DO tell us what you think. *CLICK*
$%%&amp Oh dear, we've been cut *CLICK* Well hello *CLICK* You're breaking up...
Just what could be inside Dropbox's new 'Home For Life'?
Biz apps, messaging, photos, email, more storage – sorry, did you think there would be cake?
IT bods: How long does it take YOU to train up on new tech?
I'll leave my arrays to do the hard work, if you don't mind
Amazon reveals its Google-killing 'R3' server instances
A mega-memory instance that never forgets
prev story

Whitepapers

Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.