Feeds

Platform wants to out-map, out-reduce Hadoop

Teaching financial grids to dance like stuffed elephants

The Essential Guide to IT Transformation

Why would Platform do that?

Why would Platform do that? Because companies are already generating and storing data in their file systems and they don't want to port their data from one file system to HDFS. Then they have a data porting problem every time they want to run a job. Platform says it makes much more sense to chew on the big data where it sits and in whatever format it is encoded in. That said, Hertzler concedes that this may not always yield the best performance for a MapReduce job even if it is the easiest way to do it.

If you have never heard of Platform Symphony, Version 5 of the tool was announced in November 2009 and it sported this neat feature called data affinity. Instead of moving data around of clusters in a machine to computing elements to perform calculations, the Symphony scheduler figured out where the data is and dispatches the much smaller program code to the node with the data on it and has it chew on the data. On certain calculations popularly used in the financial services space where Symphony is used, applications can speed up by an order of magnitude by moving code instead of data around the cluster.

Symphony 5 spans up to 20,000 cores in a single cluster running as many as 5,000 cores per application. Symphony has multicore optimizations to make jobs run more efficiently on modern processors, with all of their cores and threads, and has a feature called MultiCluster that allows multiple Symphony grids to be managed as a single resource pool. This last feature allows for work to be spread across multiple clusters. The kind of thing, says Hertzler, that Hadoop users are wrestling with right now as they have multiple clusters running multiple jobs, Symphony can already do.

Platform is not talking much about how well Symphony will run MapReduce applications using its implementation of the Hadoop, Pig, and Hive APIs, and that is because the Platform MapReduce Workload Manager is still being tweaked. In some cases, the Symphony MapReduce functionality is faster than Hadoop on the same iron, and it never performs slower, according to Hertzler.

"When it comes to the Java execution engine, we are really far ahead of Hadoop," he says, saying that financial institutions using Symphony for their risk analysis are running their programs in under 100 milliseconds ahead of trades. "Performance is important, but companies have service level agreements, and as they roll MapReduce workloads into production, they want a product that has been around for a while and that can deliver on them."

Platform has a few proof of concept customers testing MapReduce support for Symphony right now and will roll out the product later in the summer. The company says in its presentation for the MapReduce support that it will be able to put 40,000 cores in a single Symphony cluster and allocate as many as 10,000 cores to a single applications, and that is twice what Symphony 5 can do. (I guess we know what Symphony 6 will look like.) The future Symphony release will be able to process 17,000 tasks per second with what Platform says is "extremely low latency" of under 1 millisecond.

The company will not be open sourcing Symphony or the MapReduce support code it has created, just as it has kept the most recent versions of its LSF product closed source. (The company did open up an earlier release of its LSF tool to foment the Lava cluster management tool, however.)

The one thing that the Symphony MapReduce release will also have is a price tag that is significantly higher that the free and open source stack. Without the MapReduce functionality, Symphony 5 costs $250,000 for a 100-node cluster, and scales up to millions of dollars for licenses. ®

The Essential Guide to IT Transformation

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
Microsoft says 'weird things' can happen during Windows Server 2003 migrations
Fix coming for bug that makes Kerberos croak when you run two domain controllers
Multipath TCP speeds up the internet so much that security breaks
Black Hat research says proposed protocol will bork network probes, flummox firewalls
Cisco says network virtualisation won't pay off everywhere
Another sign of strain in the Borg/VMware relationship?
Forrester says Australia, not China, is next boom market for cloud
It's cloudy but fine down under, analyst says
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
The Essential Guide to IT Transformation
ServiceNow discusses three IT transformations that can help CIO's automate IT services to transform IT and the enterprise.
Maximize storage efficiency across the enterprise
The HP StoreOnce backup solution offers highly flexible, centrally managed, and highly efficient data protection for any enterprise.