Microsoft feeds Excel to supercomputer
Windows HPC chases Linux
SC09 If the quants in financial services get tools for running their models more quickly, will the economy get better or worse? Who knows? But thanks to Microsoft, we're all going to find out.
At the SC09 supercomputing trade show in Portland, Oregon, this week, Microsoft is trumpeting the parallel programming smarts its baking into future Visual Studio 2010 development tools - and how its Windows HPC Server 2008 R2 will reach parity with Linux performance for MPI-style parallel applications.
But the most interesting thing that Microsoft is actually talking about at the show - well, the second most interesting thing after the full-sized flight simulator in its booth - is a future combination of Excel 2010 and Windows HPC Server 2008 that will allow companies to bolt an x64 cluster to their workstations to radically improve the performance of macro-based models built in Excel workbooks.
This may not seem like a big deal, but it will be. While all kinds of companies use parallel supercomputers to simulate physical objects or to chew through data of one kind or another to make a decision. according to Vince Mendillo, senior director of high performance computing at Microsoft's Server and Tools Business Group, even the most sophisticated users - and plenty of unsophisticated ones - build models in plain old Excel rather than do "real" programming. Real or not, companies have made an enormous investment in their Excel models, and many of them have run up against severe performance barriers.
And so, Microsoft will allow users to lash the future Excel 2010 to a Windows HPC Server 2008 R2 cluster (both programs are in beta now), turning an x64 cluster into an Excel workbook co-processor, radically speeding up the performance of macros running inside the workbooks.
This may not sound like HPC, but it is if you are making money at it, which Microsoft very likely will. By Mendillo's reckoning, there are an estimated 500 million active Excel users worldwide and somewhere between 50 and 55 million quant workers among them who use spreadsheets to build models of all kinds and across all industries.
Microsoft estimates there are hundreds of millions of "heavy users" of Excel that might benefit from having a server or a cluster to offload workbook crunching duties to. These users have their own user defined functions buried in their workbooks, which create the models, and in some cases, Mendillo says shops have 400,000 to 600,000 lines of code buried in their spreadsheet models.
And thus, it comes as no surprise that at one financial services firm, a workbook run for a model that took 45 hours to run on a beefy, high-end Windows workstation was able to run in about two hours when backed up by a 16-node server cluster.
Mendillo also boasted that the Excel 2010-HPC Server combo, which hasn't even shipped yet, was behind the single largest server operating system deal that Microsoft has done in its history. While Mendillo won't say how big the deal was, or who did it, he did say that it was at a financial services firm with very complex Excel models that was also interested in Microsoft's F# programming language, which is a functional language akin to Pascal that quants have taken a shining to.
The F# language is in customer technology preview this week and is part of Visual Studio, which is in beta. Windows HPC Server 2008 R2, which is required for the integration with Excel, is in beta now as well, and it includes enhancements to the scheduler and tuning in the Message Passing Interface (MPI) stack used to create an HPC cluster. The MPI stack has optimizations for the latest processors from Intel and Advanced Micro Devices, better MPI debugging, and enhanced support for the Remote Direct Memory Access (RDMA) protocol over Ethernet and InfiniBand.
"We're just as fast as Linux on microkernel and other benchmarks, which is a big change for us over the past three years," says Mendillo, who added that the HPC business was "growing extremely fast" but would not quantify that. "We are getting a lot of consideration on deals we didn't see a year and a half ago. We are getting considered half the time now."
Part of this is due to the fact that there are new companies (particularly in life sciences) and users (particularly in financial services) who do not have experience with Linux and don't want it. They are growing up out of their workstations and into HPC clusters. And they want to keep their data on Windows servers and use Active Directory and they want to use tools like SharePoint to share the results of their calculations too.
To help make a Windows HPC Server cluster less intimidating, Microsoft's Systems Center tools have been tweaked with the upcoming R2 release to do a better job deploying a cluster. Mendillo says that System Center can deploy a 1,000-node cluster in between 4 and 5 hours. Microsoft has also created a tool nicknamed "The Lizard," short for the Linpack Performance Wizard, that takes the smarts of the best HPC techies at Microsoft and encapsulates it in a set of wizards that automatically optimizes a cluster to run the Linpack Fortran benchmark. The idea is that you tune for the benchmark and now your parallel applications will run better, too.
Mendillo would not say when Microsoft would get the recently acquired  Star-P application parallelization tools into its stack, but it looks like the tools will go into Windows HPC Server at first, not Visual Studio. Mendillo said that the key people of Interactive Supercomputing are now working in the Microsoft "nerd center" across the street from the Massachusetts Institute of Technology in Cambridge, and that Star-P will be in technology preview with a future release of HPC Server sometime next year.
Microsoft's Windows HPC Server still didn't rank very high on the Top 500 supercomputing list  that came out this week at SC09, but the University of Southampton popped onto the list at number 74 running the R2 beta on 66.8 teraflops cluster made of IBM's iDataPlex iron and using Xeon 5500 processors.
Expect Microsoft to have a much better showing soon, though. The Tokyo Institute of Technology has just inked a deal to do a second-generation cluster based on blade servers from Sun Microsystems, GPU co-processors from Nvidia, and Windows HPC Server 2008 R2. This machine, named Tsubame 2, will replace Tsubame 1, which was comprised of Sun Opteron blades and ClearSpeed math co-processors and was rated at 87 teraflops. Tsubame 2 is going to weigh in at a much heftier 3 petaflops of peak performance and will almost certainly rank in the top five when it is delivered in the spring of 2010. It may even take over the top spot. ®