Platform wants to out-map, out-reduce Hadoop

Teaching financial grids to dance like stuffed elephants

Intelligent flash storage arrays

Why would Platform do that?

Why would Platform do that? Because companies are already generating and storing data in their file systems and they don't want to port their data from one file system to HDFS. Then they have a data porting problem every time they want to run a job. Platform says it makes much more sense to chew on the big data where it sits and in whatever format it is encoded in. That said, Hertzler concedes that this may not always yield the best performance for a MapReduce job even if it is the easiest way to do it.

If you have never heard of Platform Symphony, Version 5 of the tool was announced in November 2009 and it sported this neat feature called data affinity. Instead of moving data around of clusters in a machine to computing elements to perform calculations, the Symphony scheduler figured out where the data is and dispatches the much smaller program code to the node with the data on it and has it chew on the data. On certain calculations popularly used in the financial services space where Symphony is used, applications can speed up by an order of magnitude by moving code instead of data around the cluster.

Symphony 5 spans up to 20,000 cores in a single cluster running as many as 5,000 cores per application. Symphony has multicore optimizations to make jobs run more efficiently on modern processors, with all of their cores and threads, and has a feature called MultiCluster that allows multiple Symphony grids to be managed as a single resource pool. This last feature allows for work to be spread across multiple clusters. The kind of thing, says Hertzler, that Hadoop users are wrestling with right now as they have multiple clusters running multiple jobs, Symphony can already do.

Platform is not talking much about how well Symphony will run MapReduce applications using its implementation of the Hadoop, Pig, and Hive APIs, and that is because the Platform MapReduce Workload Manager is still being tweaked. In some cases, the Symphony MapReduce functionality is faster than Hadoop on the same iron, and it never performs slower, according to Hertzler.

"When it comes to the Java execution engine, we are really far ahead of Hadoop," he says, saying that financial institutions using Symphony for their risk analysis are running their programs in under 100 milliseconds ahead of trades. "Performance is important, but companies have service level agreements, and as they roll MapReduce workloads into production, they want a product that has been around for a while and that can deliver on them."

Platform has a few proof of concept customers testing MapReduce support for Symphony right now and will roll out the product later in the summer. The company says in its presentation for the MapReduce support that it will be able to put 40,000 cores in a single Symphony cluster and allocate as many as 10,000 cores to a single applications, and that is twice what Symphony 5 can do. (I guess we know what Symphony 6 will look like.) The future Symphony release will be able to process 17,000 tasks per second with what Platform says is "extremely low latency" of under 1 millisecond.

The company will not be open sourcing Symphony or the MapReduce support code it has created, just as it has kept the most recent versions of its LSF product closed source. (The company did open up an earlier release of its LSF tool to foment the Lava cluster management tool, however.)

The one thing that the Symphony MapReduce release will also have is a price tag that is significantly higher that the free and open source stack. Without the MapReduce functionality, Symphony 5 costs $250,000 for a 100-node cluster, and scales up to millions of dollars for licenses. ®

Security for virtualized datacentres

More from The Register

next story
Just don't blame Bono! Apple iTunes music sales PLUMMET
Cupertino revenue hit by cheapo downloads, says report
The DRUGSTORES DON'T WORK, CVS makes IT WORSE ... for Apple Pay
Goog Wallet apparently also spurned in NFC lockdown
Cray-cray Met Office spaffs £97m on VERY AVERAGE HPC box
Only 250th most powerful in the world? Bring back Michael Fish
Microsoft brings the CLOUD that GOES ON FOREVER
Sky's the limit with unrestricted space in the cloud
'ANYTHING BUT STABLE' Netflix suffers BIG Europe-wide outage
Friday night LIVE? Nope. The only thing streaming are tears down my face
Google roolz! Nest buys Revolv, KILLS new sales of home hub
Take my temperature, I'm feeling a little bit dizzy
IBM, backing away from hardware? NEVER!
Don't be so sure, so-surers
prev story


Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
The hidden costs of self-signed SSL certificates
Exploring the true TCO for self-signed SSL certificates, including a side-by-side comparison of a self-signed architecture versus working with a third-party SSL vendor.