Feeds

Platform wants to out-map, out-reduce Hadoop

Teaching financial grids to dance like stuffed elephants

Top 5 reasons to deploy VMware with Tegile

Why would Platform do that?

Why would Platform do that? Because companies are already generating and storing data in their file systems and they don't want to port their data from one file system to HDFS. Then they have a data porting problem every time they want to run a job. Platform says it makes much more sense to chew on the big data where it sits and in whatever format it is encoded in. That said, Hertzler concedes that this may not always yield the best performance for a MapReduce job even if it is the easiest way to do it.

If you have never heard of Platform Symphony, Version 5 of the tool was announced in November 2009 and it sported this neat feature called data affinity. Instead of moving data around of clusters in a machine to computing elements to perform calculations, the Symphony scheduler figured out where the data is and dispatches the much smaller program code to the node with the data on it and has it chew on the data. On certain calculations popularly used in the financial services space where Symphony is used, applications can speed up by an order of magnitude by moving code instead of data around the cluster.

Symphony 5 spans up to 20,000 cores in a single cluster running as many as 5,000 cores per application. Symphony has multicore optimizations to make jobs run more efficiently on modern processors, with all of their cores and threads, and has a feature called MultiCluster that allows multiple Symphony grids to be managed as a single resource pool. This last feature allows for work to be spread across multiple clusters. The kind of thing, says Hertzler, that Hadoop users are wrestling with right now as they have multiple clusters running multiple jobs, Symphony can already do.

Platform is not talking much about how well Symphony will run MapReduce applications using its implementation of the Hadoop, Pig, and Hive APIs, and that is because the Platform MapReduce Workload Manager is still being tweaked. In some cases, the Symphony MapReduce functionality is faster than Hadoop on the same iron, and it never performs slower, according to Hertzler.

"When it comes to the Java execution engine, we are really far ahead of Hadoop," he says, saying that financial institutions using Symphony for their risk analysis are running their programs in under 100 milliseconds ahead of trades. "Performance is important, but companies have service level agreements, and as they roll MapReduce workloads into production, they want a product that has been around for a while and that can deliver on them."

Platform has a few proof of concept customers testing MapReduce support for Symphony right now and will roll out the product later in the summer. The company says in its presentation for the MapReduce support that it will be able to put 40,000 cores in a single Symphony cluster and allocate as many as 10,000 cores to a single applications, and that is twice what Symphony 5 can do. (I guess we know what Symphony 6 will look like.) The future Symphony release will be able to process 17,000 tasks per second with what Platform says is "extremely low latency" of under 1 millisecond.

The company will not be open sourcing Symphony or the MapReduce support code it has created, just as it has kept the most recent versions of its LSF product closed source. (The company did open up an earlier release of its LSF tool to foment the Lava cluster management tool, however.)

The one thing that the Symphony MapReduce release will also have is a price tag that is significantly higher that the free and open source stack. Without the MapReduce functionality, Symphony 5 costs $250,000 for a 100-node cluster, and scales up to millions of dollars for licenses. ®

Beginner's guide to SSL certificates

More from The Register

next story
Ellison: Sparc M7 is Oracle's most important silicon EVER
'Acceleration engines' key to performance, security, Larry says
Oracle SHELLSHOCKER - data titan lists unpatchables
Database kingpin lists 32 products that can't be patched (yet) as GNU fixes second vuln
Lenovo to finish $2.1bn IBM x86 server gobble in October
A lighter snack than expected – but what's a few $100m between friends, eh?
Ello? ello? ello?: Facebook challenger in DDoS KNOCKOUT
Gets back up again after half an hour though
Hey, what's a STORAGE company doing working on Internet-of-Cars?
Boo - it's not a terabyte car, it's just predictive maintenance and that
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
The next step in data security
With recent increased privacy concerns and computers becoming more powerful, the chance of hackers being able to crack smaller-sized RSA keys increases.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.