Microsoft morphs HPC efforts into 'Big Compute'

More than traditional supercomputing, and all about big gobs of CPU

Internet Security Threat Report 2014

A few years back, Microsoft was all gung-ho to try to take on Linux in the high performance computing market, and did a pretty good job with Windows Server 2008 in drawing to performance parity on traditional HPC workloads. But now, in the wake of the massive reorganization by CEO Steve Ballmer, the HPC efforts are being rolled up into a new Big Compute team.

The Big Compute team is part of the Cloud and Enterprise Engineering Group. Big Compute includes people from the Enterprise and Cloud Division and the Windows Azure Division at Microsoft, explained Alex Sutton, who is the group program manager for Big Compute, in a blog post.

Big Compute is a term that was coined to resonate with Big Data, and in many cases interplay with it. And Sutton offered a definition to show the distinction between the two: "Big Compute applications typically require large amounts of compute power for hours or days at a time. Some of our customers describe what they are doing as HPC, but others call it risk analysis, rendering, transcoding, or digital design and manufacturing."

Sutton said that the HPC Pack for Windows Server is going to be enhanced going forward, and that Microsoft was working on new "Big Compute scenarios" to run in the Windows Azure cloud. One of those, hinted Sutton, will be to make it easier to fire up a local cluster running Windows Server and burst various kinds of work out to Azure.

"You'll continue to see new features and new releases from us on a regular basis," Sutton said. "The team is really excited about the new capabilities we'll be bringing to our customers."

Microsoft has a few Windows-based systems on the most recent Top500 rankings, with two clusters running Windows HPC Server 2008 with a combined 460.4 teraflops of sustained performance on the Linpack Fortran benchmark and 38,028 cores across the two machines.

Microsoft also set up a virtual cluster and ran the Linpack test on it again. It is called called Faenov, in homage to Kyril Faenov, who used to lead Microsoft's HPC efforts before he died in May 2012.

This Azure virtual cluster is based on HP's SL230s Gen8 half-width modular servers, which use Intel's Xeon E5-2670 processors running at 2.6GHz. It has 8,064 cores and uses QDR InfiniBand to link the nodes together; it has 151.3 teraflops of Linpack oomph.

The Tsubame 2 supercomputer is technically a dual-boot machine, running Windows or Linux across its nodes, but in the Top500 rankings it is listed as a Linux beast with 1.19 petaflops of sustained Linpack performance extracted from the 73,279 Xeon 5600 cores and 56,996 Nvidia Fermi cores it contains.

Redmond clearly had high hopes to break into supercomputing bigtime with the launch of Windows HPC Server 2008 R2 back in November 2009 at the SC09 supercomputing extravaganza. At the time, Microsoft was looking not only at traditional HPC jobs like you see at the national labs and research centers, but adjacent HPC-style work – now called big compute – that would allow a cluster of servers to act as a parallel accelerator for Excel 2010.

Microsoft estimated that there were hundreds of millions of "heavy users" of Excel that might benefit from having a server or a cluster to offload workbook crunching to. It is unclear how that effort panned out, but it certainly has not become common to have a cluster back-ending an Excel workbook as far as El Reg knows.

There is no HPC edition any more, but there is an HPC Pack. NBack in December those add-ons to Windows that provide math libraries and a Message Passing Interface (MPI) communication stack that rivals the performance of Linux equivalents. The December update, the fourth release of this software, added support for Windows Server 2012 for various kinds of cluster nodes and for Windows 8 as clients and workstation nodes for cycle harvesting.

Microsoft has worked with Mellanox Technologies to tune up Windows to take better advantage of Remote Direct Memory Access (RDMA), which allows for server nodes in a cluster to access each other's main memory without having to go through the network stack of the operating system.

This significantly lowers node-to-node latency, which is becoming important not just for traditional HPC workloads but also for infrastructure clouds, database clusters, and virtual desktop infrastructure.

You can run RDMA on InfiniBand or RDMA over Converged Ethernet, or RoCE, over Ethernet networks, and with the SMB Direct protocol for file serving, VDI workloads take less server oomph and you can get more hosted Windows PC slices onto a physical server.

"Performance and scale remains at our heart," explained Sutton. "Because we are part of the Windows Azure Group, we are driving capabilities like low-latency RDMA networking in the fabric. We make sure that customers can reliably provision thousands of cores to run their compute jobs."

"Windows Azure is evolving to provide a range of capabilities for Big Compute and Big Data. High memory instances help more applications run in the cloud, and we are tuning our RDMA and MPI stacks with partners now."

The Big Compute team will also work with Microsoft Research and its partners in academia and the supercomputing labs, says Sutton, but he did not get into specifics. ®

Beginner's guide to SSL certificates

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
'Urika': Cray unveils new 1,500-core big data crunching monster
6TB of DRAM, 38TB of SSD flash and 120TB of disk storage
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
SDI wars: WTF is software defined infrastructure?
This time we play for ALL the marbles
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Oracle hires former SAP exec for cloudy push
'We know Larry said cloud was gibberish, and insane, and idiotic, but...'
prev story


Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
Win a year’s supply of chocolate
There is no techie angle to this competition so we're not going to pretend there is, but everyone loves chocolate so who cares.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.