Feeds

Platform Computing doubles up cluster management

Bigger grids, same price, and GPUs too

Next gen security for virtualised datacentres

Supercomputer clusters are getting larger and larger, and that is Platform Computing has to revamp its Load Sharing Facility to version 8 and double up the capacity of the workload scheduling software for grids and clusters. The updated LSF also supports GPU co-processors as full citizens of the cluster.

With LSF 7, Platform Computing could manage a cluster that had 24,000 cores and on the order of 100,000 pending jobs, according to Ken Hertzler, vice president of product management at the grid computing pioneer. With LSF 8, which will start shipping in January 2011, a single instance of the cluster management tool will be able to span a cluster comprised of 48,000 cores and 200,000 pending jobs. And if you need to span larger cluster sizes, you can gang up multiple LSF 8 instances to control grids that have 100,000 cores and up to 1.5 million pending jobs.

This may seem like plenty of scalability, but Hertzler says that Platform Computing already has a couple of accounts that have clusters that range from 50,000 to 70,000 cores, so the doubling up of cluster scalability for LSF is not just a matter of providing lots of headroom to most customers. With core counts on the rise in x64 processors from Intel and Advanced Micro Devices to the tune of 30 per cent or so in the coming year and companies simultaneously adding more nodes to clusters, Platform Computing has to broaden its core and pending job counts. In fact, it won't be long before Platform Computing has to jack up the core counts some more.

LSF 8 is more than a tweaked version of the code with twice the cluster scalability, and Hertzler says it is the first major release of the product since LSF 7 shipped four years ago. And now it speaks GPU as well as CPU.

Platform Computing's entry and midrange cluster management tool, Platform HPC 2.1, was announced last week ahead of the SC10 supercomputing conference, and it was the first program put out by the company to be able to directly schedule jobs on GPU co-processors. Now the full-on LSF scheduler, which is the flagship product from Platform Computing, has this capability. With the GPU support in LSF 8, jobs can be dispatched to them directly and the scheduler has smarts to see utilization and thermals for the GPUs so it can distribute workloads to avoid creating hot spots in the cluster.

Whether or not you use CPUs or a mix of CPUs and GPUs in your workloads (you can't actually run an operating system and applications directly on a GPU - yet), LSF 8 has a number of performance and scalability enhancements that can help boost the utilization on your clusters. And important new feature is called guaranteed resources, which is designed to make sure jobs get the resources they need to run to meet the service levels agreements that people require when they submit jobs. Because resources could not be guaranteed in prior releases, cluster administrators often had to carve their clusters up into silos, with higher priority jobs locking up resources that are often just sitting there, waiting for their job to start and lower priority jobs not finishing as quickly as they might had they had short-term access to those siloed resources.

With guaranteed resources, which are driven by SLAs set by cluster administrators, the scheduler finds the best way to meet the SLAs without partitioning up the cluster. The scheduler also now has pre-emptive and fair-share scheduling policies, which allows LSF to pre-empt jobs and steal resources temporarily from one job to help meet one SLA while at the same time allowing the second job to meet its SLAs. Basically, the software lets a bunch of small jobs say: "Hold on a minute until I finish and then you can have a lot more CPUs, big job."

The performance improvements moving from LSF 7 to LSF 8 on a given cluster will vary by jobs and system configuration, and there won't be much of an improvement if customers are already up near 100 per cent utilization. But Hertzler says for those customers who are maybe able to get 60 to 70 per cent utilization on their clusters running a large number of mixed workloads, they might be able to squeeze another 10 to 20 per cent utilization out of their clusters (and therefore get the same work done in a shorter period of time), and that is a significant improvement.

LSF 8 also has a new administrative rights delegation feature, which gets the cluster administrator out of the politics of who gets to use what cluster when. Now, supercomputer center or business line managers who have access to the cluster can add and remove users from the list of people who have access to the cluster to submit jobs and determine the service level they want for specific jobs. The LSF administrator then gets back to the job of managing the cluster, not answering cranky phone calls from people who all think they deserve special treatment.

LSF 8 can dispatch work to clusters running various Linuxes, Unixes, and Windows operating systems as well as Mac OS X; you can see a full list of the supported platforms here. LSF 8 has the same price as LSF 7, and customers on a support contract with Platform Computing can upgrade at no charge. While Platform Computing provides pricing for its HPC 2.1 stack, it does not reveal its prices for the LSF tool, except to say it charges on a per-core basis with site-wide (and presumably volume discounted) licenses available. ®

Secure remote control for conventional and virtual desktops

More from The Register

next story
HP busts out new ProLiant Gen9 servers
Think those are cool? Wait till you get a load of our racks
Shoot-em-up: Sony Online Entertainment hit by 'large scale DDoS attack'
Games disrupted as firm struggles to control network
Like condoms, data now comes in big and HUGE sizes
Linux Foundation lights a fire under storage devs with new conference
Community chest: Storage firms need to pay open-source debts
Samba implementation? Time to get some devs on the job
Silicon Valley jolted by magnitude 6.1 quake – its biggest in 25 years
Did the earth move for you at VMworld – oh, OK. It just did. A lot
Forrester says it's time to give up on physical storage arrays
The physical/virtual storage tipping point may just have arrived
prev story

Whitepapers

5 things you didn’t know about cloud backup
IT departments are embracing cloud backup, but there’s a lot you need to know before choosing a service provider. Learn all the critical things you need to know.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Backing up Big Data
Solving backup challenges and “protect everything from everywhere,” as we move into the era of big data management and the adoption of BYOD.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?