Feeds

Platform Computing doubles up cluster management

Bigger grids, same price, and GPUs too

High performance access to file storage

Supercomputer clusters are getting larger and larger, and that is Platform Computing has to revamp its Load Sharing Facility to version 8 and double up the capacity of the workload scheduling software for grids and clusters. The updated LSF also supports GPU co-processors as full citizens of the cluster.

With LSF 7, Platform Computing could manage a cluster that had 24,000 cores and on the order of 100,000 pending jobs, according to Ken Hertzler, vice president of product management at the grid computing pioneer. With LSF 8, which will start shipping in January 2011, a single instance of the cluster management tool will be able to span a cluster comprised of 48,000 cores and 200,000 pending jobs. And if you need to span larger cluster sizes, you can gang up multiple LSF 8 instances to control grids that have 100,000 cores and up to 1.5 million pending jobs.

This may seem like plenty of scalability, but Hertzler says that Platform Computing already has a couple of accounts that have clusters that range from 50,000 to 70,000 cores, so the doubling up of cluster scalability for LSF is not just a matter of providing lots of headroom to most customers. With core counts on the rise in x64 processors from Intel and Advanced Micro Devices to the tune of 30 per cent or so in the coming year and companies simultaneously adding more nodes to clusters, Platform Computing has to broaden its core and pending job counts. In fact, it won't be long before Platform Computing has to jack up the core counts some more.

LSF 8 is more than a tweaked version of the code with twice the cluster scalability, and Hertzler says it is the first major release of the product since LSF 7 shipped four years ago. And now it speaks GPU as well as CPU.

Platform Computing's entry and midrange cluster management tool, Platform HPC 2.1, was announced last week ahead of the SC10 supercomputing conference, and it was the first program put out by the company to be able to directly schedule jobs on GPU co-processors. Now the full-on LSF scheduler, which is the flagship product from Platform Computing, has this capability. With the GPU support in LSF 8, jobs can be dispatched to them directly and the scheduler has smarts to see utilization and thermals for the GPUs so it can distribute workloads to avoid creating hot spots in the cluster.

Whether or not you use CPUs or a mix of CPUs and GPUs in your workloads (you can't actually run an operating system and applications directly on a GPU - yet), LSF 8 has a number of performance and scalability enhancements that can help boost the utilization on your clusters. And important new feature is called guaranteed resources, which is designed to make sure jobs get the resources they need to run to meet the service levels agreements that people require when they submit jobs. Because resources could not be guaranteed in prior releases, cluster administrators often had to carve their clusters up into silos, with higher priority jobs locking up resources that are often just sitting there, waiting for their job to start and lower priority jobs not finishing as quickly as they might had they had short-term access to those siloed resources.

With guaranteed resources, which are driven by SLAs set by cluster administrators, the scheduler finds the best way to meet the SLAs without partitioning up the cluster. The scheduler also now has pre-emptive and fair-share scheduling policies, which allows LSF to pre-empt jobs and steal resources temporarily from one job to help meet one SLA while at the same time allowing the second job to meet its SLAs. Basically, the software lets a bunch of small jobs say: "Hold on a minute until I finish and then you can have a lot more CPUs, big job."

The performance improvements moving from LSF 7 to LSF 8 on a given cluster will vary by jobs and system configuration, and there won't be much of an improvement if customers are already up near 100 per cent utilization. But Hertzler says for those customers who are maybe able to get 60 to 70 per cent utilization on their clusters running a large number of mixed workloads, they might be able to squeeze another 10 to 20 per cent utilization out of their clusters (and therefore get the same work done in a shorter period of time), and that is a significant improvement.

LSF 8 also has a new administrative rights delegation feature, which gets the cluster administrator out of the politics of who gets to use what cluster when. Now, supercomputer center or business line managers who have access to the cluster can add and remove users from the list of people who have access to the cluster to submit jobs and determine the service level they want for specific jobs. The LSF administrator then gets back to the job of managing the cluster, not answering cranky phone calls from people who all think they deserve special treatment.

LSF 8 can dispatch work to clusters running various Linuxes, Unixes, and Windows operating systems as well as Mac OS X; you can see a full list of the supported platforms here. LSF 8 has the same price as LSF 7, and customers on a support contract with Platform Computing can upgrade at no charge. While Platform Computing provides pricing for its HPC 2.1 stack, it does not reveal its prices for the LSF tool, except to say it charges on a per-core basis with site-wide (and presumably volume discounted) licenses available. ®

High performance access to file storage

More from The Register

next story
Seagate brings out 6TB HDD, did not need NO STEENKIN' SHINGLES
Or helium filling either, according to reports
European Court of Justice rips up Data Retention Directive
Rules 'interfering' measure to be 'invalid'
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
Cisco reps flog Whiptail's Invicta arrays against EMC and Pure
Storage reseller report reveals who's selling what
Just what could be inside Dropbox's new 'Home For Life'?
Biz apps, messaging, photos, email, more storage – sorry, did you think there would be cake?
IT bods: How long does it take YOU to train up on new tech?
I'll leave my arrays to do the hard work, if you don't mind
Amazon reveals its Google-killing 'R3' server instances
A mega-memory instance that never forgets
USA opposes 'Schengen cloud' Eurocentric routing plan
All routes should transit America, apparently
prev story

Whitepapers

Mainstay ROI - Does application security pay?
In this whitepaper learn how you and your enterprise might benefit from better software security.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.