Platform clusters Windows HPC with Linux
Tetris with supercomputers
Platform Computing has taken the beta off its Infrastructure Sharing Facility, a way of greasing the adoption of Windows HPC Server 2008 among the supercomputing folk.
Having seen a zillion different operating systems and architectures over the past three decades, these HPC techies like the portability and commonality of Linux across the remaining incompatible server platforms. But Windows not being Linux is not an insurmountable barrier.
The HPC variant of Windows runs on x64 iron, which is what most supercomputer centers use if you don't count Itanium-based machines or the exotic custom-made and often hybrid supers installed at the largest (and usually government sponsored) facilities. (Windows HPC Server could run on Itanium boxes, but it doesn't).
Let's assume there are some codes, perhaps in newbie fields like the life sciences, where Windows has a chance to become a platform for HPC. There's still a problem. It's a pain in the neck to make a parallel supercomputer cluster dual-boot, much less make it adjust to workloads on the fly and schedule a collection of Linux or Windows nodes to support upcoming work.
This problem is, as it turns out, one of the things that Platform Computing was thinking about when it launched the beta of Infrastructure Sharing Facility, of ISF, back in June. At the time, Platform was pitching ISF as a means of deploying multiple virtual machine hypervisors on server clusters to create clouds, but ISF can also be used to slap Linux or Windows HPC Server onto server nodes used for supercomputing work.
As ISF becomes generally available today, Platform is also launching a feature called ISF Adaptive Cluster, which can schedule OS images on server nodes to meet whatever applications need as they are put into the master ISF scheduler.
"We are making static HPC dynamic," says Martin Harris, director of product management at Platform. "There is a lot of Tetris you could play with workloads, across multiple operating systems, to make clusters more efficient."
While scientists are no less prone to server hugging that business department heads, the need to drive more efficiency out of the IT gear is no less important in the HPC center than it is in the corporate data center. It is just that, perhaps, supercomputer centers have been able to get away with lower efficiency for longer. With ISF and its Adaptive Cluster rapid provisioning tools, which can hook into dual-boot or network images for OS provisioning on server nodes in a cluster, Harris says that Platform can drive the utilization of a cluster from somewhere around the typical 40 to 50 per cent level up as high as 80 to 90 percent - provided you can get people to share a machine and assign higher and lower priorities to workloads. Platform ISF, as it turns out, actually gets to play the workload Tetris, not system administrators.
The Microsoft partnership with Platform does not involve money and marketing - at least not yet. It's lab work to make sure ISF Adaptive Cluster works as well with Windows HPC Server 2008 as it does with commercial Linuxes. The ISF uber-management tool can make use of the LSF tool that Platform created for HPC workloads or the Symphony SOA-style workload manager that Platform created for financial services applications (such as risk analysis, pricing, actuarial, anti-money laundering, and other analytical work) running on clustered servers.
Platform is charging $795 per server socket for base ISF license. Pricing for the Adaptive Cluster feature was not divulged. There is a 30-day free trial here if you want to kick it around.
Speaking of Symphony, the new release 5 is getting ready to ship in tandem with the Supercomputing 09 trade show next week. The neat new feature with Symphony 5 is called data affinity, and it turns data loading and computing in a cluster on its head.
With data affinity, you take a large data set and spread it out over the cluster, and then when you need to do a calculation, you do the math on the node that has the data already on it, rather than doing what clusters tend to do, which is load the data and then assign a calculation to a node and then try to move the data around. You spend a lot of energy and time moving data around when you do it the traditional way. On the codes that Platform has tested Symphony 5 with, doing the calculations where the data already is can speed up certain calculations by an order of magnitude.
The updated Symphony software also includes a feature called the multicore optimizer, which does exactly what the name suggests: reduces contention for memory and I/O in the cluster by taking advantage of the multiple cores and multiple threads in modern servers.
Platform's Symphony 5 can run on clusters with up to 20,000 CPUs and with as many as 5,000 cores allocated to parallel applications. Symphony 5 costs from the low hundreds of thousands of dollars to millions of dollars, and it's priced per core. On a typical 100-node cluster, you are talking about a cool $250,000 for the software. ®