Feeds

Platform wants to out-map, out-reduce Hadoop

Teaching financial grids to dance like stuffed elephants

Internet Security Threat Report 2014

Chewing on big data using the MapReduce protocol, and the open source Hadoop stack that implements it, is all the rage these days. But there is more than one way to stuff an elephant.

The Hadoop tool created by Yahoo! (and named after a stuffed elephant) is now managed by the Apache Software Foundation, and it is the tool of choice for running MapReduce algorithms against unstructured data. Platform Computing, the pioneer of grid computing that has been plying the HPC racket for two decades, says it has created a better way to run MapReduce algorithms against big data: Plunk it on Platform's Symphony financial grid software.

Platform has not ported Hadoop to the Symphony tool or somehow split open its code and shimmed chunks of Hadoop into Symphony, explains Ken Hertzler, vice president of product management at the company. Instead, Hertzler tells El Reg, Platform has grabbed the Hadoop MapReduce APIs, which are written in Java just like Hadoop and Symphony are, and embedded support for the MapReduce APIs into Symphony.

Ditto for the APIs for Pig, the programming language created for Hadoop and analogous to SQL for a relational database (but not SQL-like), and the APIs for Hive, which is a query language for Hadoop that actually offers commands similar to SQL for those who want to extract data out of their mapped and reduced unstructured data.

To support applications written for Hadoop, Platform is adding support for the Hadoop Distributed File System (HDFS) underneath Symphony, and is still allowing for IBM's General Parallel File System (GPFS) and Appistry's CloudIQ Storage clustered file system to plug into Symphony. The Platform MapReduce product is being rebranded as the Platform Workload Manager when it is tweaked to support MapReduce code. Here's what it looks like conceptually:

Platform Symphony MapReduce support

Platform runs MapReduce code on Symphony

Platform also wants to support commercial MapReduce projects and inferfaces, including IBM's Bigsheets and Python and C++ interfaces for the MapReduce APIs.

Symphony, if you are not acquainted with it, was created by Platform nine years ago because financial services firms that were trying to use its Load Sharing Facility (LSF) to run risk arbitrage applications were very unhappy with the sluggish performance and scale of that gridding software for running time-sensitive workloads. While LSF is good at managing the workflow of multiple HPC jobs on a supercomputing cluster, it was not designed to run one or a few jobs at low latency and high throughput. So Platform gutted LSF and created Symphony from scratch in the Java programming language. And over the years, it has ramped up the scalability of Symphony so it can span lots of cores.

There are a number of problems besides scalability that Platform is trying to address by support the Hadoop/Pig/Hive API stack on top of Symphony. The first is workload management for MapReduce applications.

"In the current Hadoop distro, it is one job at a time," Hertzler tells El Reg. "You need to add distributed cluster logic to manage multiple MapReduce jobs at the same time on the same cluster." Or, use multiple Hadoop clusters, as Yahoo! does. "But Symphony is already a distributed workload manager and knows how to distribute data and work around a cluster."

Platform is also pitching the fact that using Symphony to run MapReduce workloads gives customers a choice of file systems for their MapReduce workloads.

"We're not tied to any file system," says Hertzler. "We plan to open it up so customers can attach to any existing file system."

Beginner's guide to SSL certificates

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
'Urika': Cray unveils new 1,500-core big data crunching monster
6TB of DRAM, 38TB of SSD flash and 120TB of disk storage
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
SDI wars: WTF is software defined infrastructure?
This time we play for ALL the marbles
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Oracle hires former SAP exec for cloudy push
'We know Larry said cloud was gibberish, and insane, and idiotic, but...'
Symantec backs out of Backup Exec: Plans to can appliance in Jan
Will still provide support to existing customers
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
Win a year’s supply of chocolate
There is no techie angle to this competition so we're not going to pretend there is, but everyone loves chocolate so who cares.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.