ScaleMP: Still making big ones out of small ones
Disaggregation - reducing aggravation?
SaaS data loss: The problem you didn’t know you had
ScaleMP’s vSMP Foundation for SMP software is now certified to run on IBM’s newest x86 boxes, according to TPM’s story here.
Like Tim, I’ve been a bit surprised that ScaleMP is still running free and hasn’t been bought by one of the big guys. It’s good news for both parties that ScaleMP’s stuff is now officially blessed by IBM: reading about it caused an almost coherent stream of thought about cause/effect and futures.
[Begin almost coherent thought stream]
To me, this is another sign that points out the trend toward IT disaggregation.
Technology like ScaleMP’s (along with that of others) allows individual systems to easily and efficiently use resources that are physically located on other systems. With ScaleMP, it’s memory; with someone like (NextIO), it’s a box that gives a bunch of servers access to up to 24 PCIe I/O devices (can mix ‘n match Infiniband, GPU, SSD, etc.)
What’s happening here is that we’re moving beyond the boundaries and limitations imposed by the motherboard and system bus architectures – that’s obvious.
The reasons behind the trend are also obvious. At the base level, it’s a drive toward achieving higher utilization of assets. When something like an uber-fast storage array or a GPU is attached to a network, it can be used by others. When it’s locked to a single server, then its utilization is dictated by the load on that single system.
Right now, I have seen only Dell and NextIO put forward a solution to share GPUs and other PCIe devices. But I would expect that by this time next year, we’re going to see a plethora (or at least a half-plethora) of devices that attempt to do the same thing. We’ll also see better virtualization mechanisms to enable more sharing with less overhead.
Another way to look at this trend is to think of it as the death of the balanced system concept. A while back, vendors and customers alike were talking about how systems had to somehow be ‘balanced’ – meaning that processing, I/O, memory, and storage were in some kind of ratio that was just ‘right’ somehow.
The question that always came to my mind (and out of my mouth, if I was in a prickly mood) was, “Balanced for what?” The answer from the vendor or customer was either something completely vague, like “You know, like for computing and stuff” or vaguely specific, as in “For database and application serving workloads.”
So what is a balanced system, and how do you know if you have one? With true balance, we’d see a system without bottlenecks – or, more accurately, a system where CPU, memory, and I/O all hit their max limits at the same time.
The reality is that workloads aren’t balanced; they’re lumpy. There are apps that will saturate I/O channels while only slightly stressing processors and memory. There are other apps that will soak up every cycle and GB of memory in the system and barely touch I/O at all.
The only way to maximize throughput, efficiency, and utilization is to match the systems to the workloads. In other words, technology that disaggregates CPU, memory, and I/O so that it can be used by whatever workload needs it most at any given time.
This isn’t a new concept; it’s been around for a long time in the mainframe. In fact, mainframes will allow users to assign a response time requirement for an application, and the system does everything necessary to make sure that it happens – even at the expense of lower priority workloads.
It was hard enough to figure this stuff out and implement it in the 70s and 80s with complete control over the hardware, o/s, and much of the software stack. It’s going to take a while for this to become a reality in the open system world, since no one owns all of the pieces. However, it’s going to happen. It’s happening now.
[End almost coherent thought stream]
COMMENTS
stranded capacity
SGI uses a term in their cloudrack documentation - stranded power, I've stolen it and used it towards resource utilization - stranded capacity(whether it's CPU/memory/IO etc). But reducing or eliminating stranded capacity is what it's all about these days, driving up capacity utilization by reducing those islands of capacity that you can't use because of some other constraint(maybe have plenty of cpu but not enough memory for example)
The recent advances in cpu tech pound this even further, most apps simply do not scale to be able to tax a typical 8-24 core server by themselves. And from what I've seen it will be some time until they do, that level of optimization is pretty complex, and priorities just aren't there when "workarounds" like virtualization can fill the gap in the meantime.
I wrote recently about "testing the limits of virtualization"(google it) where I go off on a tangent and explore pushing the hypervisors to their limits. I've spent a lot of time this year thinking about how best to drive capacity utilization upwards, and the savings you can get are pretty amazing, even when compared against "legacy" strategies such as deploying "cheap white box" servers for your apps.

IT infrastructure monitoring strategies
Agentless Backup is Not a Myth
Top 10 SIEM implementer’s checklist
Steps to Take Before Choosing a Business Continuity Partner
Enabling efficient data center monitoring