Adaptive Computing speaks better GPU with Moab 6
Torque is cheap
SC10 Supercomputer clusters are getting larger every year, and now they are getting math help from adjunct devices such as GPU co-processors.
Cluster provisioning and management tools therefore have to scale from tens of thousands of cores to hundreds of thousands – without choking on their own communication with cluster nodes. They also have to be aware of co-processors and keep them fed.
Scalability and GPU support are therefore the key new features in the Moab Cluster Suite 6.0 cluster management tool and its companion Moab Viewpoint 2.0 console.
On the scalability front, the current Moab 5.4 release is sufficient to scale on clusters with 40,000 to 50,000 nodes, but Peter ffolkes, vice president of marketing for Adaptive Computing, says it won't be long before clusters are scaling up to many more nodes and multiple millions of processor cores, which presents the cluster-management tool with a much larger communication issue. This is why the company gutted the underlying communication system linking the Moab console to the server nodes and their computing resources to make it more efficient and therefore improve the response time of the tool. Specifically, Moab has enhanced the multi-threading of its own Moab communication stack, which is written in C, so the probes that track the performance of the nodes and cores out there in the cluster do not interfere with the scheduler at the heart of the Moab tool. The streamlined node communication protocols run 100 times faster, and the result is that it takes less resources to run Moab 6 on a cluster of a given size, compared to Moab 5.4, the current release.
By the way, Adaptive Computing has not had to change the node-count scalability above that 50,000-node level because, as ffolkes puts it, "no one is getting anywhere near that yet."
So the scalability improvements with Moab 6.0 are about how the tool feels when it is running on growing clusters. It will be less sluggish in terms of response time, but absolute scaling is the same.
The new Moab 6.0 tool also has much-improved GPU management capabilities, which is an increasing requirement at many HPC shops. Moab 5.4 was able to designate an x64 server node in a cluster as one with a GPU, but system designers are getting clever about how they lash GPUs to servers and there is not always a permanent and one-to-one relationship between the CPUs and the GPUs. Some nodes in a cluster have multiple GPUs, and NextIO, Dell, and others are making special outboard GPU enclosures that can be configured to multiple server nodes and changed on the fly.
Obviously, a cluster job scheduler needs to be able not only to see the GPUs, but configure the GPUs to specific nodes and then dispatch work to them. To that end, Moab 6.0 includes the Torque 2.5.4 open source resource manager, which allows Moab 6.0 to gather up detailed information about the GPUs and how they can be configured to servers.
Finally, Moab 6.0 includes an updated Viewpoint 2.0 Web-based management console, which is bringing over more features from the company's prior Moab Access Portal and Control Manager fat client, Java-based management console. With the Viewpoint 2.0 release, ffolkes says that most of the commands from the old tools are in the new one, plus the additional features to manage GPUs and virtual machines in cloudy infrastructure. The new tool is written in a mix of Java and the Google Web Toolkit (GWT). Among other things, the Viewpoint console can be now used to manage physical and virtual nodes in large-scale HPC or commercial clusters and can be used to kick off migrations of virtual machines around a cluster or the moving of workloads from one physical server to another.
Moab Cluster Suite 6.0 runs on Linux-based servers, and one machine can manage a cluster with tens of thousands of nodes. For larger installations, you can federate Moab controller servers and carve a cluster up into domains for each Moab machine to manage.
Moab's Adaptive Computing Suite extensions to the core Moab Cluster Suite can manage both Linux and Windows HPC Server 2008 R2 images. Moab Cluster suite costs under $100 per server socket, with Adaptive Computing Suite costing under $300 per socket, according to ffolkes, He said the Moab stack usually represents somewhere between 3 to 5 per cent of a cluster node cost. ®