Intel and VMware get their RAS on
Server memory futures
IDF Not content with adding comprehensive power-saving features to its processors, Intel is working on extending advanced power management for server memory as well as improving memory-error protection.
These improvements are under the umbrellas of RAS - a term originated 'way back when' by IBM to denote technologies related to reliability, availability, and servicability - and Memory Power Management.
The future of both were detailed during a tag-team tech talk by VMware's Rich Brunner and Intel's Sunil Saxena at this week's Intel Developer Forum in San Francisco.
Brunner succinctly described the need for RAS as applied to memory: "Right now, just one hardware or software error can cause a platform to reboot. And the hundreds or potentially thousands of virtual machines that you might be running on that platform? They've all got to go down and reboot. I think in the vernacular we say, 'That sucks.'"
The need for RAS is growing because of the increased amounts of memory found in x86 systems. "Because of these scale-up servers," Brunner said, "we end up having an immense memory footprint - a tremendous amount of memory from an x86 perspective. One terabyte from an x86 perspective is a lot of memory."
Although Memory RAS encompasses such familiar concepts as ECC-protected cache and memory, Brunner and Saxena focused on recovery from uncorrectable memory errors (UMEs) and the ability to shut down unused memory, thus saving power.
Managing UMEs is enhanced in Intel's upcoming Nehalem EX processor, which should appear early next year on a motherboard near you. That eight-core multi-socket beast will include a Memory Patrol Scrubber, which Brunner described as an "autonomous little engine in the background that is constantly fetching and updating ECC patterns," adding that "It's controlled completely by firmware - [software] doesn't have any interaction with it. It's a pretty cool little thing."
If the Memory Patrol Scrubber finds an uncorrectable error, it can instruct the Nehalem EX to "contain" that memory - meaning to wall it off from use or from being written to disk - and the software managing the processor can then decide how serious the problem is, and either go down or simply terminate the affected application or virtual machine and go on its merry way.
"In previous systems," Brunner said, "the hardware and the operating system would have no choice but to crash."
Advanced Memory Power Management technologies are further down the road, but Intel and VMware have a few tricks under investigation. According to Saxena, for example, Intel is currently experimenting with three new memory states and how they might reduce the power needed to run memory.
The three experimental states are Memory Self Refresh, Memory Standby, and Memory Offline, all of which use the power-management technology known as ACPI (advanced configuration and power interface) to set a server's memory into one of the three states depending upon its needs.
Saxena claimed that memory in self-refresh mode requires about half the amount of power of memory in the standard "active idle" mode. Memory on standby would use only about a third of active-idle power, and memory that's placed offline - to no surprise - requires no power whatsover.
The controls work at the memory riser-card level. A four-socket Nehalem EX has eight riser cards, two per socket; each riser card has two Millbrook controllers, and each Millbrook controls four DIMMs. If each of those DIMMs is an 8GB module, the total memory will be half a terabyte - and half a terabyte requires a lot of power to keep lit.
Brunner said that VMware has an experimental build of its vSphere virtualization suite that takes advantage of the equally experimental trio of memory states supported by Intel hardware. Essentially, the hardware and software work together to inform each other what memory pages are in use, which aren't needed and which can thus be set to lower usage states. Brunner said that tests in VMware's labs have shown power savings of up to 75 per cent in certain situations.
Vmware is also experimenting with another technology that can swap memory pages around, essentially defragmenting memory banks much in the same way a hard drive is defragged. By doing so, memory pages in use by VMs or hypervisors could be consolidated onto some riser cards, allowing other riser cards to be taken offline entirely.
While none of these advanced memory-management techniques will be available in the near future, they could one day add up to massive data-center power savings - not only in the juice required to run the servers, but also in the systems required to keep them running within temperature specs. ®
New on x86
Our session did not imply that this was new to the overall computer industry; our point was that mainframe techniques are coming to x86 and that some of these are new to x86 servers.
RAS on memory is not very new......
It was done by Tandem and Fujitsu on *their* SPARC (not Sun's) many years ago, already.
IBM's power is going to have it with P7.
Just a few tiny corrections ...
Thanks so much for coming to the session yesterday!
Actually, I think I said in the slides that Distributed Power Management, a shipping feature of VMware ESX 3.5 and vSphere 4.0, was able to save 73% power in a real world case that was cited. The memory power mgmt demo showed that we could save ~100W - ~200W (in a system showing 1000W partially idled) by placing half of the DIMMs in the system into software-controlled standyby or offline.
Just to clarify, the patrol scrubber is a background autonomous hardware process that is configued by BIOS. Without affecting normal system operation, the memory controller will
continuously perform read/write operations on memory correcting any soft errors that may exist in memory. The write will re-generate the ECC bits, update them if you will. If in the process of the read, an uncorrectable memory error is encountered (double-bit or higher), then an error (such as MCE) can be flagged to software and in older systems, the system would need to reboot. Our demo used new Intel technology that can remove this need if the error hits non-hypervisor pages. (For a quick description of memory scrubbing, see this link: http://h20000.www2.hp.com/bc/docs/support/SupportManual/c00218059/c00218059.pdf)
- Rich Brunner