Microsoft and Novell tag team on HPC
Windows on the supercomputing world
Comment The old Microsoft strategy of "embrace, extend, and extinguish" is just not going to fly in the snooty high performance computing market. Microsoft needs partners and Windows needs to coexist with Linux if the company wants to get anything more than a token share of real HPC work, which is why the company is talking up its interoperability work with Novell at the International Super Computing 2010 conference in Hamburg, Germany this week.
As we all know, Microsoft got to where it is today by crushing the competition in that triple-E bear hug. But with Windows HPC Server 2008 variant of its server platform, the company has little choice but to adopt an approach we'll call "interoperate, cooperate, and perhaps dominate," and that means not just cooperating with Linux suppliers such as Red Hat and Novell (which each have interoperability partnerships with Microsoft), but also with cluster management tool providers, so it can get into position to be a second boot option on x64-based clusters.
Microsoft and Novell were at ISC talking up the work they have done in their joint interoperability lab in Cambridge, Massachusetts, and the 33 joint customers they have running both Windows and Linux on HPC clusters. The two companies have worked with cluster management software maker Adaptive Computing to come up with a rapid dual-boot setup that lets clusters quickly shift nodes from Linux to Windows and back as workloads shift. The Rocky Mountain Supercomputing Center in Butte, Montana (which has a modest 3.2 teraflops cluster supporting Red Hat Enterprise Linux and Windows HPC Server) and the Centre for High Performance Computing (CHPC) in Cape Town, South Africa (which is runs a mix of Linux, Windows, and Unix clusters under control of Adaptive Computing's Moab 5.4 workload management tool) were singled out in Microsoft's interoperability blog as examples of Windows and Linux getting along.
While it is not polite to call someone's supercomputer cluster puny - size is all relative to the job that needs to get done, of course - Microsoft is cooperating with Novell, Adaptive Computing, and others because at this point, there is not really a good technical reason why the vast majority of x64-clusters running Linux could not be converted from static Linux machines to dynamic Linux-Windows images - provided there are applications driving HPC shops to consider Windows. The ability to do quick dual-booting is a first step to getting a broader portfolio of HPC apps running on Windows and then seeing more use of Windows on clusters.
In theory, giving Microsoft more money and power. It is unclear how snobby HPC shops are about closed source Windows after more than a decade of endorsing open source Linux (and dumping closed source Unixes), but history has shown that at the right price (rapidly approaching zero), HPC customers will happily switch hardware architectures and software platforms.
What does Microsoft get out of this? Every second Windows is running on an HPC cluster node is a second it is not running Linux. What does Novell get out of it? Continued association with Microsoft and its marketing machine and a hope that Novell can become the preferred Linux in the dual-boot cluster world Microsoft is trying to foment.
As El Reg previously reported, the latest Top 500 supercomputer rankings came out this week. Of the 500 machines on the list, 403 of them use x64 processors from Intel, 47 use x64 processors from Advanced Micro Devices, and five use Itanium processors from Intel. All of these machines, which represent an aggregate of 26.3 petaflops of aggregate number-crunching power (or 81.2 per cent of the total oomph embodied in the Top 500 list), could in theory support Windows HPC Server 2008.
Five of the machines actually do use Windows HPC Server as their dominant OS, as you can see from this clever graphic put together by the BBC in its coverage of the Top 500 rankings, and Linux is by far the dominant operating system across all processor architectures used in the supers comprising the list. Windows has about 412.6 teraflops of aggregate flops as measured by the Linpack Fortran matrix math test that is used to do the Top 500 rankings, about 1.3 per cent of the 32.4 petaflops on the list. Linux accounts for 91 per cent of the flops (27.2 petaflops), Unix gets 4.6 per cent (1.6 petaflops), and there is another 3.4 per cent that have mixed environments (generally a mix of Unix and Linux).
Windows HPC Server has a long, long way to go to get even a threatening share of installs on the Top 500 list, but with the R2 update of this code, Microsoft says the performance of Windows on Message Passing Interface (MPI) clustering software will be close to parity with Linux when it ships later this year. (The code went into its second beta in early April). Microsoft says that it has thousands of customers who have Windows clusters running real HPC work and that nearly 100 of the key HPC software houses have their code ported to Windows HPC Server too.
The Windows revolution in HPC, if there is indeed one, seems to be coming from the bottom up. But there's some action now at the top too. The Tokyo Institute of Technology is building a 2.4 petaflops hybrid CPU-GPU cluster called Tsubame 2.0 that will dual boot Windows and Linux, and this could be the wave of the future. (The 180.6 teraflops "Magic Cube" Opteron cluster at the Shanghai Supercomputer Center in China is the largest current Windows cluster in the world).
As the rapid rise of Linux in the HPC community shows, if there is any compelling advantage for using a different piece of software or hardware, these HPC folks are just the ones who will drop any technology like a hot potato and move on to something else. If Microsoft can come up with tools that do a better job of dispatching work to GPUs than the Linux stack, this could do the trick. ®