Feeds

Intel and VMware get their RAS on

Server memory futures

  • alert
  • submit to reddit

7 Elements of Radically Simple OS Migration

IDF Not content with adding comprehensive power-saving features to its processors, Intel is working on extending advanced power management for server memory as well as improving memory-error protection.

These improvements are under the umbrellas of RAS - a term originated 'way back when' by IBM to denote technologies related to reliability, availability, and servicability - and Memory Power Management.

The future of both were detailed during a tag-team tech talk by VMware's Rich Brunner and Intel's Sunil Saxena at this week's Intel Developer Forum in San Francisco.

Brunner succinctly described the need for RAS as applied to memory: "Right now, just one hardware or software error can cause a platform to reboot. And the hundreds or potentially thousands of virtual machines that you might be running on that platform? They've all got to go down and reboot. I think in the vernacular we say, 'That sucks.'"

The need for RAS is growing because of the increased amounts of memory found in x86 systems. "Because of these scale-up servers," Brunner said, "we end up having an immense memory footprint - a tremendous amount of memory from an x86 perspective. One terabyte from an x86 perspective is a lot of memory."

Although Memory RAS encompasses such familiar concepts as ECC-protected cache and memory, Brunner and Saxena focused on recovery from uncorrectable memory errors (UMEs) and the ability to shut down unused memory, thus saving power.

Managing UMEs is enhanced in Intel's upcoming Nehalem EX processor, which should appear early next year on a motherboard near you. That eight-core multi-socket beast will include a Memory Patrol Scrubber, which Brunner described as an "autonomous little engine in the background that is constantly fetching and updating ECC patterns," adding that "It's controlled completely by firmware - [software] doesn't have any interaction with it. It's a pretty cool little thing."

If the Memory Patrol Scrubber finds an uncorrectable error, it can instruct the Nehalem EX to "contain" that memory - meaning to wall it off from use or from being written to disk - and the software managing the processor can then decide how serious the problem is, and either go down or simply terminate the affected application or virtual machine and go on its merry way.

"In previous systems," Brunner said, "the hardware and the operating system would have no choice but to crash."

Advanced Memory Power Management technologies are further down the road, but Intel and VMware have a few tricks under investigation. According to Saxena, for example, Intel is currently experimenting with three new memory states and how they might reduce the power needed to run memory.

The three experimental states are Memory Self Refresh, Memory Standby, and Memory Offline, all of which use the power-management technology known as ACPI (advanced configuration and power interface) to set a server's memory into one of the three states depending upon its needs.

Saxena claimed that memory in self-refresh mode requires about half the amount of power of memory in the standard "active idle" mode. Memory on standby would use only about a third of active-idle power, and memory that's placed offline - to no surprise - requires no power whatsover.

The controls work at the memory riser-card level. A four-socket Nehalem EX has eight riser cards, two per socket; each riser card has two Millbrook controllers, and each Millbrook controls four DIMMs. If each of those DIMMs is an 8GB module, the total memory will be half a terabyte - and half a terabyte requires a lot of power to keep lit.

Brunner said that VMware has an experimental build of its vSphere virtualization suite that takes advantage of the equally experimental trio of memory states supported by Intel hardware. Essentially, the hardware and software work together to inform each other what memory pages are in use, which aren't needed and which can thus be set to lower usage states. Brunner said that tests in VMware's labs have shown power savings of up to 75 per cent in certain situations.

Vmware is also experimenting with another technology that can swap memory pages around, essentially defragmenting memory banks much in the same way a hard drive is defragged. By doing so, memory pages in use by VMs or hypervisors could be consolidated onto some riser cards, allowing other riser cards to be taken offline entirely.

While none of these advanced memory-management techniques will be available in the near future, they could one day add up to massive data-center power savings - not only in the juice required to run the servers, but also in the systems required to keep them running within temperature specs. ®

Best practices for enterprise data

More from The Register

next story
Microsoft's Euro cloud darkens: US FEDS can dig into foreign servers
They're not emails, they're business records, says court
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
VMware builds product executables on 50 Mac Minis
And goes to the Genius Bar for support
Multipath TCP speeds up the internet so much that security breaks
Black Hat research says proposed protocol will bork network probes, flummox firewalls
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
Microsoft says 'weird things' can happen during Windows Server 2003 migrations
Fix coming for bug that makes Kerberos croak when you run two domain controllers
Cisco says network virtualisation won't pay off everywhere
Another sign of strain in the Borg/VMware relationship?
prev story

Whitepapers

7 Elements of Radically Simple OS Migration
Avoid the typical headaches of OS migration during your next project by learning about 7 elements of radically simple OS migration.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Solving today's distributed Big Data backup challenges
Enable IT efficiency and allow a firm to access and reuse corporate information for competitive advantage, ultimately changing business outcomes.
A new approach to endpoint data protection
What is the best way to ensure comprehensive visibility, management, and control of information on both company-owned and employee-owned devices?