Original URL: http://www.theregister.co.uk/2010/10/25/nec_fault_tolerant_server_hyper_v/

NEC cranks fault-tolerant servers with Xeon six-shooters

Pitting Hyper-V plus hardware FT against VMware virty FT

By Timothy Prickett Morgan

Posted in Servers, 25th October 2010 09:42 GMT

Fault-tolerant server maker NEC today rolls out an upgraded FT Series box sporting Intel's "Westmere-EP" Xeon 5600 processors and is also delivering support for Microsoft's Hyper-V server virualization hypervisor, setting the stage for fight between VMware's software-based fault tolerance and the hardware-based fault tolerance and Hyper-V combo.

I know what you are thinking - weren't the Xeon 5600s launched in the middle of March, with lots of machines coming into the field soon thereafter? True. But when you make fault-tolerant servers where you are guaranteeing they will provide a minimum of five nines of availability, at the same time that you are selling a box not only as an application server but now as a virtualized application and database server, you need to do a lot of extra testing and certification.

Mike Mitsch, general manager for the IT Platform Group at NEC America, says that the company could have probably got either the new Express5800/R320b-M4 server using the six-core Xeon 5600 processors or support for Microsoft's Hyper-V server virtualization hypervisor out the door earlier and separately, but the company (like its platform partner, Stratus Technologies, did last week) decided to get the new iron and Hyper-V support out simultaneously.

The Xeon 5000 series processors have special lockstepping functions that NEC and Stratus make use of to create tightly coupled, two-node servers that flip bits in exact synchrony such that if a software error takes out one node, users don't even notice it when they are switched over to the backup node. While this fault tolerance doesn't come cheap--for a given two-socket workload, it costs about five times as much to give it fault tolerance capabilities - the FT setup is considerably less complex than using clustering software and replicating data and applications without lockstepping.

So NEC and Stratus can charge a premium for the GeminiEngine chipset that provides the fault detection and isolation that they add to NEC 5800 series servers to make use of that lockstepping inherent in the processor and turn it into an FT cluster. The GeminiEngine keeps CPU, memory, disk, and network processing in absolute synchronization across the two nodes.

NEC Express5800 FT Series Schematic

Conceptual diagram of the Express5800 FT servers

However, the FT server market is small enough that they cannot afford to indulge in providing fault tolerance for four-socket or larger x64 servers, although both companies have flirted with the idea in years gone by and even went so far as to have four-socket FT machines based on earlier Xeon MP processors from Intel in their skunkworks. While the Itanium 9100 series processors had core-level lockstepping, which is different from the socket-level lockstepping that NEC and Stratus machines have in their Xeon-based FT machines.

NEC sells both Xeon and Itanium machines and was never tempted to make an Itanium-based FT box using their core-level lockstepping because it is much harder to do than socket-level lockstepping, according to Mitsch. Advanced Micro Devices' Opteron processors do not have the lockstepping circuits that the Xeon 5000 series do, which is why you don't see FT machines built on the Opteron processors.

The Express5800/R320b-M4 does not have a catchy name, but this two-node FT cluster does have Intel's 2.93 GHz, six-core Xeon X5670 processor. By moving to these six-core chips in the R320b-M4, NEC can provide somewhere between 40 and 50 per cent more oomph than the R320a-E4 and R320a-M4 machines that it announced last August, which use the four-core Xeon E5504 (2 GHz) and X5570 (2.93 GHz) processors, respectively. (These were nicknamed the "Nehalem-EP" processors, and these chips gave Intel back the x64 server space and put AMD in the backseat of the market, where is was before the Opteron debut in 2003.)

The new R320b-M4 (and NEC, you really need to get a better naming scheme so dyslexics can tell them apart) has the same 96 GB maximum of main memory, but unlike the other two machines, the memory speed is boosted to 1.33 GHz instead of running at 1.07 GHz.

Moreover, the Westmere-EP box sports 6 GB, 2.5-inch disks in its bays, for a maximum of 4.8 TB, double that of the two smaller (in terms of performance) FT servers in the NEC lineup because these still have 300 GB disks.

NEC Express 5800 R320b-M4

The NEC Express 5800 R320b-M4 with its faceplate removed

In a base configuration with a pair of servers with no memory or disk but with on Xeon X5670 processor and a three-year warranty on the iron, NEC is charging $17,000. If you want to add the second processor, cough up another $2,700 each.

That is not the price for a logical computing unit (two physical chips, one each per node in the R320b-M4 machine, but rather a physical processor that has been put through the NEC qualification process for the Xeon X5670s of which you need to add two to double of the processing capacity of the logical machine.

At first glance, this seems a bit pricey compared to Intel list price, which is $1,440 for the same chip, but you have to remember that this NEC price is for a onesie that has had rigorous testing while the Intel price is for a 1,000-unit tray of chips. If you want to buy 1,000 FT servers from NEC, I am certain you can get a decent volume price.

Fault-tolerant servers can be carved up using hypervisors, but once again, NEC and Stratus do a lot of additional testing so they can maintain the availability levels for which they are charging a substantial premium. Prior Express5800 FT servers supported VMware's ESX Server 3.5 hypervisor, which Mitsch said NEC did without making a lot of fuss, and in the wake of VMware's vSphere 4.0 announcement in the summer of 2009, the company eventually put support for ESX Server 4.0 out for its Express5800 FT machines. Mitsch says that NEC is testing support for ESX Server 4.1 right now and will get it out the door as soon as possible on the three machines in its current lineup.

The Express5800 fault tolerant machines now also support Microsoft Hyper-V R2 hypervisor, which is a requirement NEC is getting from many customers who want to have tight clustering for virtualized Exchange Server groupware and SQL Server databases alongside their applications. Up until now, companies were buying multiple FT setups from NEC and networking them together, but not a lot of customers will be able to virtualize and put all of the applications on a single two-node cluster.

Windows Server 2003, Windows Server 2008, and Windows Server 2008 R2 are supported as operating systems on the NEC FT machines, whether virtualized or running on the bare metal, and so is Red Hat's Enterprise Linux 5.5 operating system. However, formal support for the KVM hypervisor is not yet available, even though it is embedded in RHEL 5.5.

"We are looking at KVM," says Mitsch. "We have the technology to enable it, so it is not a matter of if, but of when. But we don't have customer demand for it yet."

What NEC and Stratus alike have is a strong partner in Microsoft, which does not have software-based fault tolerant features encoded in its hypervisor, as VMware does in the fault tolerance feature in the vSphere 4.1 stack. With VMware's FT feature, the company is providing lockstepping over the network between two server nodes and making use of its VMware HA cluster feature.

The problem with the VMware FT is that it is limited to scaling to one virtual core. NEC's hardware-based fault tolerance can scale a virtual machine image to the size limits of the current hypervisors, which is four cores for Hyper-V and eight cores for ESX Server.

Maybe VMware needs to buy a hardware vendor?

NEC is taking orders for the Express5800/R320b-M4 server starting today, and the machine will start shipping some time in November. The machine comes standard as a rack-mounted cluster, but there is a tower adapter kit if you want to roll it into an office environment, as many NEC customers do. ®