ScaleMP (finally) glues together 128 Opteron servers

The 8,192 core, 64TB AMD behemoth

Internet Security Threat Report 2014

Servers are going virtual these days, so maybe it is time for server chipsets and interconnects to do the same.

With Advanced Micro Devices not building any chipsets that go beyond four Opteron processor sockets in a single system image – and no one else interested in doing chipsets, either – there is an opportunity, it would seem, for someone to make big wonking Opteron boxes to compete against RISC and Itanium machines.

Many have tried. Newisys, Liquid Computing, Fabric7, 3Leaf Systems, and NUMAscale all took very serious runs at it, and thus far, four out of five of them have gone the way of all flesh. It is not a coincidence that these companies fail because they require customers to invest in expensive software that turned many Opteron nodes into a big, often virtualized, single system image.

Since its founding in 2003, ScaleMP has tried a different approach. Instead of using special ASICs and interconnection protocols to lash together multiple server modes together into a shared memory system, ScaleMP cooked up a special hypervisor layer, called vSMP, that rides atop the x64 processors, memory controllers, and I/O controllers in multiple server nodes. Rather than carve up a single system image into multiple virtual machines, vSMP takes multiple physical servers and – using InfiniBand as a backplane interconnect – makes them look like a giant virtual SMP server with a shared memory space. vSMP has its limits. It only runs on Linux and doesn't do Windows. And up until today, it was only supported on Intel's Xeon processors, not Opterons.

Better late than never

The DDR3 memory controller etched into the Opteron 6100 processor has a 768GB upper address limit, and for many four-way machines, the way the memory slots work out, 512GB is the practical upper limit. With the impending "Interlagos" Opteron 6200s, due for launch before year's end, AMD will hopefully goose the addressable main memory to at least 2TB, if not more.

But even if it does, those four-socket Opteron 6200 boxes might be a bit pricey and that's as far as they are going to scale. If you need more memory and more I/O and CPU oomph behind it, you are outta luck. You have to either parallelize your workloads or move to an eight-socket or if you are lucky and can find one, a sixteen-socket Xeon box. Or you can get vSMP from ScaleMP and use a bunch of smaller and cheaper two-socket boxes (or maybe even single-socket boxes) to create a virtual fat memory system.

The earlier releases of vSMP could scale across 16 nodes and up to 4TB of aggregate main memory (InfiniBand is still preferred to Ethernet as the backplane interconnect), but with vSMP Foundation 3.0, launched in May 2010, the company expanded the underlying hypervisor to support up to 128 nodes and 64TB of memory in a single image.

This version of vSMP is now supported on Opteron-based servers, not just those based on Intel Xeons. ScaleMP is supporting nodes based on either the current 8-core and 12-core "Magny-Cours" Opteron 6100s and the 12-core and 16-core Opteron 6200s. The virtual machine manager at the heart of vSMP can currently scales to 128 nodes. Depending on the cores per chip and the generation you use, you can have from 2,048 to 8,192 cores in a single image.

For machines that large, you would no doubt need a very fast InfiniBand fabric to make it work well. The limiting factor in the Opteron support are AMD's homegrown chipsets, which launched two years ago ahead of the Opteron 6100s. The SR5690, SR5670, and SR5650 I/O hubs and their companion SP5100 southbridge are all supported by the vSMP hypervisor.

The vSMP hypervisor that glues systems together is not for every workload, but on workloads where there is a lot of message passing between server nodes – financial modeling, supercomputing, data analytics, and similar parallel workloads. Shai Fultheim, the company's founder and chief executive officer, says ScaleMP has over 300 customers now. "We focused on HPC as the low-hanging fruit," Fultheim tells El Reg, "but these days we are doing business analytics and virtualization consolidation."

That latter one might crack you up a bit. You put vSMP on a bunch of servers to glue them together, and then you use a hypervisor like VMware's ESXi or Red Hat's KVM to cut it up into virtual slices. The benefit of this way of doing it is that you can build fat VM instances using skinny servers, and some people think the economics makes sense and are giving this idea a whirl.

ScaleMP needs to get Windows supported on vSMP in addition to adding Opteron support, which quite frankly would have been more useful two years ago when the Opteron 6100s came out. You could also make the case that vSMP would be useful on skinnier (and cheaper) Opteron 4100 nodes for certain kinds of workloads, like those that are sensitive to clock speed as well as memory capacity.

ScaleMP vSMP Triad Benchmark on AMD

Triad memory test scales linearly on vSMP on Opteron servers

ScaleMP will ship vSMP for Opteron servers on October 1. It will be available in the same three flavors that the Xeon version of the hypervisor comes in. vSMP Foundation for Cluster is used to take multiple server images and plunk them on a single server image running one copy of a Linux operating system; you use vSMP and that operating system instead of a cluster manager to run workloads.

You don't aggregate memory in this case. vSMP Foundation for SMP is a slightly different tweak on the vSMP hypervisor that is designed to create a big shared memory space (often asymmetrically, mixing server nodes with skinny and fat physical main memories to get the desired balance of CPU core count, memory capacity, and cost) for applications to run in. And vSMP Foundation for Cloud has a user-based priced and is aimed at public and private clouds that want to aggregate VMs atop a virtual shared memory system for more configuration options than you can do with two-socket or four-socket server nodes by themselves. Most cloud providers using vSMP, says Fultheim, deploy it on only about 20 per cent of their nodes.

Pricing for vSMP depends on the scenario and is based on a percentage of the infrastructure costs customers have as they build clusters and virtual SMPs. For the cluster configuration, ScaleMP is charging about 20 per cent of cost of the underlying iron, and on the shared memory SMP setups, it workers out to about 30 per cent of the cost of the iron. For clouds, where nodes are not always ganged up, ScaleMP is charging 5 to 10 per cent of the infrastructure costs.

"We believe that this reflects the value customers get from the software," says Fultheim. While this may be true, it is probably better to figure out what those percentages work out to on average and just put a price tag on it. ®

Beginner's guide to SSL certificates

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
'Urika': Cray unveils new 1,500-core big data crunching monster
6TB of DRAM, 38TB of SSD flash and 120TB of disk storage
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
SDI wars: WTF is software defined infrastructure?
This time we play for ALL the marbles
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Oracle hires former SAP exec for cloudy push
'We know Larry said cloud was gibberish, and insane, and idiotic, but...'
Symantec backs out of Backup Exec: Plans to can appliance in Jan
Will still provide support to existing customers
prev story


Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
Win a year’s supply of chocolate
There is no techie angle to this competition so we're not going to pretend there is, but everyone loves chocolate so who cares.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.