Platform gets graphic with HPC cluster manager
Supercomputing for Linux noobs
Not everybody who needs to build a cluster wants to be a Linux expert. And that is why Platform Computing has slapped an all-encompassing Web-based graphical user interface onto the 3 release of its Platform HPC cluster management tool.
Those who are Linux experts, of course, will be able to fly from command to command as they set up clusters, using the command line interface as they have in prior Platform HPC releases. And they will be able to take advantage of a number of performance enhancements that Platform Computing has added with this rev of the product.
Platform's Load Sharing Facility (LSF) is the flagship job scheduler that the company sells directly for managing very large grids with up to 48,000 cores and up to 200,000 jobs in the queue stacked up to run on the grid. LSF 8, which was announced in November 2010 and which doubled up the scalability over the prior LSF 7 releases. The product is aimed at electronics, auto, and other manufacturers who need very large grids to run their design simulations.
The Platform HPC tools, by contrast, are aimed at smaller customers with less daunting grids and perhaps a lot less expertise in managing clusters. Rather than sell Platform HPC directly, the stack is sold through OEM partners who brand and push the product as part of their cluster sales. Platform HPC OEMs include Cray, Hewlett-Packard, Fujitsu, and Dell; when the HPC stack was completely open source, Red Hat also OEMed it, but when Platform moved its proprietary LSF scheduler into the stack, Red Hat could not resell it since all of its wares need to be open source.
Platform HPC 3 is not based on the LSF 8, which is overkill, but on LSF 7 Update 6, which is the most stable release the company has. With the changes that the company has made, including a new GUI that exposes all of the workload, message passing interface (MPI), and cluster management features as well as the provisioning widgets in the underlying stack, Platform reckons that a typical cluster will go up a whole lot faster. William Lu, director of HPC product marketing at Platform, tells El Reg that customers can spend up to two to three months setting up their clusters, but this can be cut down to weeks or days (depending on how much coffee and Jolt you have on hand) using the HPC 3 tool.
This may not sound like a big deal, but on a three-year economic life of a cluster, of you blow three months setting it up using a hodge-podge of open source tools that are not particularly well integrated, you have lost a twelfth of the cluster's value. (That's not a slam on open source, and Platform contributes to various open source projects and has even sold support for bundles of such code in the past.)
Time to cluster is not the only thing that Platform says gives it an advantage. Lu claims that the LSF job scheduler can deliver anywhere from 2 to 20 per cent better throughout scheduling jobs on a cluster compared to Grid Engine, OpenPBS, and other grid schedulers, so you get more work done. And the company has tuned the MPI libraries in the HPC 3 stack as well, offering as much as 10 per cent better performance than rival and open MPI alternatives.
"HPC customers tend to try to bring as many CPUs as possible online and use as much free software as possible," explains Lu. "They haven't including the long learning curve it takes to get such clusters up to full speed. We're starting to see a shift. Customers are looking at both throughout and utilization now. They want to get the cluster up quickly and they want to maintain high throughput throughout the life of the system."
This is not the first GUI that Platform has put into the field. In fact, it is the third generation of the Web interfaces that Platform has put together – this time using Ajax and this time encompassing all, not just some, of the features in the underlying cluster manager.
Platform HPC can provision and manage Linux-based cluster nodes and can also monitor and dual-boot machines that run Microsoft's Windows HPC Server 2008. Most people these days, says Lu, are doing dynamic rebooting anyway because it is so much faster than reprovisioning a node every time the workload changes.
Platform HPC 3 will be available through the company's OEM partners and has a suggested retailed price of $550 per node. ®