Feeds

Intel and VMware get their RAS on

Server memory futures

  • alert
  • submit to reddit

Top 5 reasons to deploy VMware with Tegile

IDF Not content with adding comprehensive power-saving features to its processors, Intel is working on extending advanced power management for server memory as well as improving memory-error protection.

These improvements are under the umbrellas of RAS - a term originated 'way back when' by IBM to denote technologies related to reliability, availability, and servicability - and Memory Power Management.

The future of both were detailed during a tag-team tech talk by VMware's Rich Brunner and Intel's Sunil Saxena at this week's Intel Developer Forum in San Francisco.

Brunner succinctly described the need for RAS as applied to memory: "Right now, just one hardware or software error can cause a platform to reboot. And the hundreds or potentially thousands of virtual machines that you might be running on that platform? They've all got to go down and reboot. I think in the vernacular we say, 'That sucks.'"

The need for RAS is growing because of the increased amounts of memory found in x86 systems. "Because of these scale-up servers," Brunner said, "we end up having an immense memory footprint - a tremendous amount of memory from an x86 perspective. One terabyte from an x86 perspective is a lot of memory."

Although Memory RAS encompasses such familiar concepts as ECC-protected cache and memory, Brunner and Saxena focused on recovery from uncorrectable memory errors (UMEs) and the ability to shut down unused memory, thus saving power.

Managing UMEs is enhanced in Intel's upcoming Nehalem EX processor, which should appear early next year on a motherboard near you. That eight-core multi-socket beast will include a Memory Patrol Scrubber, which Brunner described as an "autonomous little engine in the background that is constantly fetching and updating ECC patterns," adding that "It's controlled completely by firmware - [software] doesn't have any interaction with it. It's a pretty cool little thing."

If the Memory Patrol Scrubber finds an uncorrectable error, it can instruct the Nehalem EX to "contain" that memory - meaning to wall it off from use or from being written to disk - and the software managing the processor can then decide how serious the problem is, and either go down or simply terminate the affected application or virtual machine and go on its merry way.

"In previous systems," Brunner said, "the hardware and the operating system would have no choice but to crash."

Advanced Memory Power Management technologies are further down the road, but Intel and VMware have a few tricks under investigation. According to Saxena, for example, Intel is currently experimenting with three new memory states and how they might reduce the power needed to run memory.

The three experimental states are Memory Self Refresh, Memory Standby, and Memory Offline, all of which use the power-management technology known as ACPI (advanced configuration and power interface) to set a server's memory into one of the three states depending upon its needs.

Saxena claimed that memory in self-refresh mode requires about half the amount of power of memory in the standard "active idle" mode. Memory on standby would use only about a third of active-idle power, and memory that's placed offline - to no surprise - requires no power whatsover.

The controls work at the memory riser-card level. A four-socket Nehalem EX has eight riser cards, two per socket; each riser card has two Millbrook controllers, and each Millbrook controls four DIMMs. If each of those DIMMs is an 8GB module, the total memory will be half a terabyte - and half a terabyte requires a lot of power to keep lit.

Brunner said that VMware has an experimental build of its vSphere virtualization suite that takes advantage of the equally experimental trio of memory states supported by Intel hardware. Essentially, the hardware and software work together to inform each other what memory pages are in use, which aren't needed and which can thus be set to lower usage states. Brunner said that tests in VMware's labs have shown power savings of up to 75 per cent in certain situations.

Vmware is also experimenting with another technology that can swap memory pages around, essentially defragmenting memory banks much in the same way a hard drive is defragged. By doing so, memory pages in use by VMs or hypervisors could be consolidated onto some riser cards, allowing other riser cards to be taken offline entirely.

While none of these advanced memory-management techniques will be available in the near future, they could one day add up to massive data-center power savings - not only in the juice required to run the servers, but also in the systems required to keep them running within temperature specs. ®

Choosing a cloud hosting partner with confidence

More from The Register

next story
BOFH: WHERE did this 'fax-enabled' printer UPGRADE come from?
Don't worry about that cable, it's part of the config
Azure TITSUP caused by INFINITE LOOP
Fat fingered geo-block kept Aussies in the dark
You think the CLOUD's insecure? It's BETTER than UK.GOV's DATA CENTRES
We don't even know where some of them ARE – Maude
Want to STUFF Facebook with blatant ADVERTISING? Fine! But you must PAY
Pony up or push off, Zuck tells social marketeers
Yahoo! blames! MONSTER! email! OUTAGE! on! CUT! CABLE! bungle!
Weekend woe for BT as telco struggles to restore service
Oi, Europe! Tell US feds to GTFO of our servers, say Microsoft and pals
By writing a really angry letter about how it's harming our cloud business, ta
prev story

Whitepapers

Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Managing SSL certificates with ease
The lack of operational efficiencies and compliance pitfalls associated with poor SSL certificate management, and how the right SSL certificate management tool can help.
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.