Feeds

Intel and VMware get their RAS on

Server memory futures

  • alert
  • submit to reddit

Protecting against web application threats using SSL

IDF Not content with adding comprehensive power-saving features to its processors, Intel is working on extending advanced power management for server memory as well as improving memory-error protection.

These improvements are under the umbrellas of RAS - a term originated 'way back when' by IBM to denote technologies related to reliability, availability, and servicability - and Memory Power Management.

The future of both were detailed during a tag-team tech talk by VMware's Rich Brunner and Intel's Sunil Saxena at this week's Intel Developer Forum in San Francisco.

Brunner succinctly described the need for RAS as applied to memory: "Right now, just one hardware or software error can cause a platform to reboot. And the hundreds or potentially thousands of virtual machines that you might be running on that platform? They've all got to go down and reboot. I think in the vernacular we say, 'That sucks.'"

The need for RAS is growing because of the increased amounts of memory found in x86 systems. "Because of these scale-up servers," Brunner said, "we end up having an immense memory footprint - a tremendous amount of memory from an x86 perspective. One terabyte from an x86 perspective is a lot of memory."

Although Memory RAS encompasses such familiar concepts as ECC-protected cache and memory, Brunner and Saxena focused on recovery from uncorrectable memory errors (UMEs) and the ability to shut down unused memory, thus saving power.

Managing UMEs is enhanced in Intel's upcoming Nehalem EX processor, which should appear early next year on a motherboard near you. That eight-core multi-socket beast will include a Memory Patrol Scrubber, which Brunner described as an "autonomous little engine in the background that is constantly fetching and updating ECC patterns," adding that "It's controlled completely by firmware - [software] doesn't have any interaction with it. It's a pretty cool little thing."

If the Memory Patrol Scrubber finds an uncorrectable error, it can instruct the Nehalem EX to "contain" that memory - meaning to wall it off from use or from being written to disk - and the software managing the processor can then decide how serious the problem is, and either go down or simply terminate the affected application or virtual machine and go on its merry way.

"In previous systems," Brunner said, "the hardware and the operating system would have no choice but to crash."

Advanced Memory Power Management technologies are further down the road, but Intel and VMware have a few tricks under investigation. According to Saxena, for example, Intel is currently experimenting with three new memory states and how they might reduce the power needed to run memory.

The three experimental states are Memory Self Refresh, Memory Standby, and Memory Offline, all of which use the power-management technology known as ACPI (advanced configuration and power interface) to set a server's memory into one of the three states depending upon its needs.

Saxena claimed that memory in self-refresh mode requires about half the amount of power of memory in the standard "active idle" mode. Memory on standby would use only about a third of active-idle power, and memory that's placed offline - to no surprise - requires no power whatsover.

The controls work at the memory riser-card level. A four-socket Nehalem EX has eight riser cards, two per socket; each riser card has two Millbrook controllers, and each Millbrook controls four DIMMs. If each of those DIMMs is an 8GB module, the total memory will be half a terabyte - and half a terabyte requires a lot of power to keep lit.

Brunner said that VMware has an experimental build of its vSphere virtualization suite that takes advantage of the equally experimental trio of memory states supported by Intel hardware. Essentially, the hardware and software work together to inform each other what memory pages are in use, which aren't needed and which can thus be set to lower usage states. Brunner said that tests in VMware's labs have shown power savings of up to 75 per cent in certain situations.

Vmware is also experimenting with another technology that can swap memory pages around, essentially defragmenting memory banks much in the same way a hard drive is defragged. By doing so, memory pages in use by VMs or hypervisors could be consolidated onto some riser cards, allowing other riser cards to be taken offline entirely.

While none of these advanced memory-management techniques will be available in the near future, they could one day add up to massive data-center power savings - not only in the juice required to run the servers, but also in the systems required to keep them running within temperature specs. ®

Choosing a cloud hosting partner with confidence

More from The Register

next story
Wanna keep your data for 1,000 YEARS? No? Hard luck, HDS wants you to anyway
Combine Blu-ray and M-DISC and you get this monster
US boffins demo 'twisted radio' mux
OAM takes wireless signals to 32 Gbps
Google+ GOING, GOING ... ? Newbie Gmailers no longer forced into mandatory ID slurp
Mountain View distances itself from lame 'network thingy'
Apple flops out 2FA for iCloud in bid to stop future nude selfie leaks
Millions of 4chan users howl with laughter as Cupertino slams stable door
Students playing with impressive racks? Yes, it's cluster comp time
The most comprehensive coverage the world has ever seen. Ever
Run little spreadsheet, run! IBM's Watson is coming to gobble you up
Big Blue's big super's big appetite for big data in big clouds for big analytics
Seagate's triple-headed Cerberus could SAVE the DISK WORLD
... and possibly bring us even more HAMR time. Yay!
prev story

Whitepapers

Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
The next step in data security
With recent increased privacy concerns and computers becoming more powerful, the chance of hackers being able to crack smaller-sized RSA keys increases.