Feeds

Intel and VMware get their RAS on

Server memory futures

  • alert
  • submit to reddit

Security for virtualized datacentres

IDF Not content with adding comprehensive power-saving features to its processors, Intel is working on extending advanced power management for server memory as well as improving memory-error protection.

These improvements are under the umbrellas of RAS - a term originated 'way back when' by IBM to denote technologies related to reliability, availability, and servicability - and Memory Power Management.

The future of both were detailed during a tag-team tech talk by VMware's Rich Brunner and Intel's Sunil Saxena at this week's Intel Developer Forum in San Francisco.

Brunner succinctly described the need for RAS as applied to memory: "Right now, just one hardware or software error can cause a platform to reboot. And the hundreds or potentially thousands of virtual machines that you might be running on that platform? They've all got to go down and reboot. I think in the vernacular we say, 'That sucks.'"

The need for RAS is growing because of the increased amounts of memory found in x86 systems. "Because of these scale-up servers," Brunner said, "we end up having an immense memory footprint - a tremendous amount of memory from an x86 perspective. One terabyte from an x86 perspective is a lot of memory."

Although Memory RAS encompasses such familiar concepts as ECC-protected cache and memory, Brunner and Saxena focused on recovery from uncorrectable memory errors (UMEs) and the ability to shut down unused memory, thus saving power.

Managing UMEs is enhanced in Intel's upcoming Nehalem EX processor, which should appear early next year on a motherboard near you. That eight-core multi-socket beast will include a Memory Patrol Scrubber, which Brunner described as an "autonomous little engine in the background that is constantly fetching and updating ECC patterns," adding that "It's controlled completely by firmware - [software] doesn't have any interaction with it. It's a pretty cool little thing."

If the Memory Patrol Scrubber finds an uncorrectable error, it can instruct the Nehalem EX to "contain" that memory - meaning to wall it off from use or from being written to disk - and the software managing the processor can then decide how serious the problem is, and either go down or simply terminate the affected application or virtual machine and go on its merry way.

"In previous systems," Brunner said, "the hardware and the operating system would have no choice but to crash."

Advanced Memory Power Management technologies are further down the road, but Intel and VMware have a few tricks under investigation. According to Saxena, for example, Intel is currently experimenting with three new memory states and how they might reduce the power needed to run memory.

The three experimental states are Memory Self Refresh, Memory Standby, and Memory Offline, all of which use the power-management technology known as ACPI (advanced configuration and power interface) to set a server's memory into one of the three states depending upon its needs.

Saxena claimed that memory in self-refresh mode requires about half the amount of power of memory in the standard "active idle" mode. Memory on standby would use only about a third of active-idle power, and memory that's placed offline - to no surprise - requires no power whatsover.

The controls work at the memory riser-card level. A four-socket Nehalem EX has eight riser cards, two per socket; each riser card has two Millbrook controllers, and each Millbrook controls four DIMMs. If each of those DIMMs is an 8GB module, the total memory will be half a terabyte - and half a terabyte requires a lot of power to keep lit.

Brunner said that VMware has an experimental build of its vSphere virtualization suite that takes advantage of the equally experimental trio of memory states supported by Intel hardware. Essentially, the hardware and software work together to inform each other what memory pages are in use, which aren't needed and which can thus be set to lower usage states. Brunner said that tests in VMware's labs have shown power savings of up to 75 per cent in certain situations.

Vmware is also experimenting with another technology that can swap memory pages around, essentially defragmenting memory banks much in the same way a hard drive is defragged. By doing so, memory pages in use by VMs or hypervisors could be consolidated onto some riser cards, allowing other riser cards to be taken offline entirely.

While none of these advanced memory-management techniques will be available in the near future, they could one day add up to massive data-center power savings - not only in the juice required to run the servers, but also in the systems required to keep them running within temperature specs. ®

Security for virtualized datacentres

More from The Register

next story
Just don't blame Bono! Apple iTunes music sales PLUMMET
Cupertino revenue hit by cheapo downloads, says report
The DRUGSTORES DON'T WORK, CVS makes IT WORSE ... for Apple Pay
Goog Wallet apparently also spurned in NFC lockdown
Hey - who wants 4.8 TERABYTES almost AS FAST AS MEMORY?
China's Memblaze says they've got it in PCIe. Yow
Cray-cray Met Office spaffs £97m on VERY AVERAGE HPC box
Only 250th most powerful in the world? Bring back Michael Fish
IBM, backing away from hardware? NEVER!
Don't be so sure, so-surers
Microsoft brings the CLOUD that GOES ON FOREVER
Sky's the limit with unrestricted space in the cloud
'ANYTHING BUT STABLE' Netflix suffers BIG Europe-wide outage
Friday night LIVE? Nope. The only thing streaming are tears down my face
Google roolz! Nest buys Revolv, KILLS new sales of home hub
Take my temperature, I'm feeling a little bit dizzy
prev story

Whitepapers

Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.
Internet Security Threat Report 2014
An overview and analysis of the year in global threat activity: identify, analyze, and provide commentary on emerging trends in the dynamic threat landscape.