Making big ones out of small ones: RNA networks

Original URL: https://www.theregister.com/2010/01/12/rna_networks/

Memory aggregation

Posted in HPC, 12th January 2010 18:02 GMT

Giving users more flexibility in how they configure systems to attack various workloads was a big thread running through SC09 last year. At the show, we took at look at three different companies who are, in one way or another, providing large system images. (Click to see our posts on ScaleMP, 3Leaf, and SGI.)

One company we didn’t get a chance to talk to at SC09 is RNA networks, a Portland, Oregon based start-up that has a unique take on pasting together small commodity hardware to give it big iron capabilities.

Over the holidays, we ventured downtown to RNA’s headquarters and spent some time with Product Manager Don Whitehead. As we sat down to meet, the steady rain had somehow turned to heavy snow, but we didn’t anticipate any problems driving as the temperatures were so warm that the snow wasn’t going to stick to the roads.

RNA does something it calls memory virtualization, although memory aggregation is probably a more apt description. Its software allows users to dedicate memory on compute nodes to a common pool that can be used by any systems on the network. Our pal TPM did a great job of writing up the whats, whys, and hows of RNA here, so we’re not going to duplicate his explanation. The major change from Tim’s story is that RNA has released its RNAcache product and is pushing forward in its sales efforts.

Where RNA’s approach is different from the others we mentioned above is that RNA is not trying to provide a cache-coherent SMP image built from distributed systems. RNA’s products are aimed at providing large, shared memory spaces only.

The company firmly believes that current quad-core and six-core chips provide more than enough CPU cycles to satisfy the majority of workloads. According to RNA, memory capacity is the major bottleneck: it has a good point.

While you certainly can throw a lot of memory into a quad-socket system, performance-sensitive users (think financial services, traditional HPC, predictive analytics, Web 2.0, etc.) hit the limits when they try to put huge objects or entire databases in memory.

To maximize performance, they typically have to buy high-end boxes with large numbers of memory slots, and then populate them with expensive 8GB DIMMs. Even 8GB DIMMs, however, still don’t provide large enough memory spaces for these customers.

With RNA, these customers can devote big (or small) chunks of memory from distributed systems to the common good. RNA's admittedly small customer base is doing exactly that – on a very large scale. One installation has 300 servers working off of an 11TB shared memory pool, which is an astounding amount of resource. RNA shared some performance data from their own tests and from customer tests, and the results are profound.

Early adopters

In one case, a customer using RNAmessenger was able to increase transactions per second by over 8x, giving them an edge in trading. As importantly, the performance was highly predictable, with very little variation in latency under wildly varying transaction loads. RNA also showed me a case where use of RNAcache improved time to completion by 20x on average, with datasets that varied from 125GB to half a terabyte.

Early adopters have tended to be financial service companies, for whom performance is king. For them, it’s worth almost any amount of money to get fast time-to-solution and time-to-execution. The ability to use commodity boxes in the RNA solution is a bonus. For other customers, like those who will be adopting predictive analytics in a big way, the TCO angle will loom larger, I think. In appealing to the broader market, RNA has a strong pitch based on application performance, price/performance, and the business benefits arising from the capabilities that RNA software can provide. However, there are clouds on the horizon.

The spate of companies offering mechanisms to turn small boxes into large virtual systems didn’t come about by chance – they’re responding to problems they see arising from the straitjacket that is x86 system architecture. We know for a fact that Intel and the major system vendors see and understand this problem.

With this in mind, how long will it be before they do something about it by uncoupling x86 system components and allowing them to be scaled independently? It’s a big change in the architecture, one that would make x86 more mainframe-like than anything else. While it would take a lot of effort and potentially require significant changes in the entire x86 ecosystem, it’s certainly possible to pull it off.

X86 marks the spot

If Intel and their partners came up with an architectural change that decouples x86 system components, it would certainly cause a lot of difficulty for companies like RNA, ScaleMP, 3Leaf, and a host of others.

However, there are some factors arguing against this. First is the return-on-investment question. This kind of change will be expensive to engineer and not have an immediately apparent return – in other words, there isn’t some huge set of customers who will suddenly buy loads of gear just because of the new architecture.

The vast majority of the customers who need this capability are already buying Intel-based systems and will continue to do so. Their need to have systems that aren’t bound by the x86 motherboard is clearly there, but it’s not a universal need for everything running in the data center.

Achieving maximum theoretical efficiency in terms of utilization and performance requires that all inputs to production (CPU, memory, storage, I/O) can be independently and dynamically scaled to fit each and every workload. However, getting anywhere near this level from where we’re at right now won’t be quick, easy, or cheap.

Plus there are other ways to get around architectural restrictions: on a grand scale, cloud computing is a potential solution that’s available today; at a lower level, there are mechanisms like those we’ve talked about above. I think companies like RNA, ScaleMP, 3Leaf, and others are pushing innovations that can potentially fuel big changes in our definition of what a ‘system’ really is. It will be interesting to watch the evolution….

Oh, and the snow – yeah, it stuck. Five inches of it in one afternoon: a freak occurrence for Portland, Oregon. Our 25-minute drive back from RNA Networks took 2.5 hours which, sadly, was not a freak occurrence here in the land of hills, ice, and residents who abandon their cars at the first sight of a snowflake. Fortunately, the GCG Mobile Analysis Unit turned in another flawless road performance.