Stratus girds fault-tolerant servers with Xeon E5s
Virtualization that won't go down on you
When it comes to hardware-based fault-tolerant computing on x86 iron, there's really only two games in town: NEC and Stratus Technologies. For the past several generations of machines, these two companies have partnered, with Stratus essentially taking NEC's iron and weaving in its own system tools, packaging and pricing.
Stratus also took its fault-tolerance (FT) smarts and embedded them inside a XenServer hypervisor to create its Avance software, which clusters XenServer hypervisors and their virtual machines on two physical servers. This gives them high availability that is easier to manage than traditional system clustering and that is transparent to Windows and Linux applications. Avance 3.0 was announced at the end of July, and among other features included tweaks to support Intel's Xeon E5 family of server chips.
With fault tolerant machines, you go all the way and take two completely identical physical servers and use an add-on chipset to keep every bit and every clock on each machine in absolute lockstep; in the event of an unrecoverable hardware fault on one side of the pair, its twin keeps processing with no interruption seen by end users.
Stratus, founded in 1980, used to make its own fault tolerant servers, based on Motorola 68K processors, then Intel's i860 RISC chips, then Hewlett-Packard's PA-RISC processors, and then finally Intel's Xeon 5500s. These latter Xeon 5500 machines are known as the V-Series, and they run the VOS operating system from Stratus and include the processor, memory, and lockstepping electronics created by Stratus. For the past decade, Stratus has partnered with Japanese server maker NEC to bring its ftServer line to market, which use the Gemini lockstepping chipset created by NEC paired with two-socket Express5800 server nodes.
It always takes Stratus and NEC a little longer to bring their fault tolerant server to market than plain-vanilla Xeon servers, and it has been six months since Intel launched the "Sandy Bridge-EP" Xeon E5-2600 processors. Denny Lane, director of product marketing at Stratus, said the two FT server partners beefed up the speed with which the Gemini chipset extension to Intel's C206 variant of the "Patsburg" chipset can do a failover on a system with considerably larger main memory than its predecessors.
The sixth generation ftServer
The reason why main memory capacity and the speed of failover is important is this: server virtualization. "While hardware keeps getting more reliable over time, the bar for availability keeps going up and up. And if anything, we are seeing increasing interest in fault tolerance for virtualized workloads," Lane said.
Not that traditional FT workloads are not still driving ftServer sales. For instance, every nuclear power plant based on Siemens designs has a slew of ftServers from Stratus controlling their operations.
With the move to the Xeon E5 and the modified Gemini chipset extension for FT lockstepping of the servers, Stratus has decided to put three different configurations into the field:
The feeds and speeds of the sixth-generation ftServers
The two top-end models, the ftServer 4700 and 6400 machines, sport 256GB of maximum main memory spanning those sixteen logical cores, up from 96GB of memory across a dozen logical cores. Fault tolerant machines have to keep their memories in sync, and this gets progressively harder as memories get bigger, which is why you never see FT machines with anything close to the maximum memory that is supported by a given Xeon chip. With the fifth generation machines, Stratus was delivering 8GB of memory per core, but this has been doubled up to 16GB per core with the sixth generation machines. The E5-2600 main memory has four paths per socket, up from three with the Xeon 5600 processors they replace, so there is considerably more memory bandwidth for that Gemini chipset to deal with.
With the current machines, NEC and Stratus were able to beef up the Gemini chipset and memory synchronisation design to that the synchronisation is seven times faster, and given that the main memory on the machines can be 2.7 times larger compared to the prior generation, this means there is some headroom before a blackout condition - where one node stalls waiting for the other - can occur in the box.
Stratus is offering customers three performance points and three price points in the sixth generation ftServer lineup. Logically speaking, the base ftServer 2700 machine has one four-core 1.8GHz E5-2603 processor and has eight memory slots that support up to 32GB of memory using 8GB memory sticks. (Physically, all components are doubled up.) The machine has two Gigabit Ethernet ports for workloads and two PCI-Express 2.0 x4 peripheral slots. There are six SAS ports on the mobo and the chassis has room for eight 2.5-inch SAS disk drives. Stratus is offering either 7.2K RPM or 15K RPM drives, since customers want either "fat or fast drives", as Lane puts it, and 10K RPM are neither. Stratus hasn't sold SATA drives for a couple of ftServer generations now. The servers can also be configured with up to eight 2.5-inch solid-state drives. Stratus has chosen the 200GB Pliant eMLC drives sold by SanDisk for its machines. All this iron fits in a 4U chassis.
The ftServer 4700 adds the second four-core E5-2603 processor and adds another eight memory slots while boosting maximum memory capacity to 256GB using 16GB sticks. If you want more I/O, you can slide in a riser card that gives you two PCI-Express 2.0 slots and two 10 Gigabit Ethernet ports.
The ftServer 6400 has two eight-core E5-2670 processors and 256GB of max memory, and the extra PCI-Express slots and 10GE ports are standard.
The relative performance of the new ftServers machines is quite a bit more than the boxes Stratus has sold for the past several years:
Relative performance of Stratus ftServers
That performance jump means that ftServers can run larger database, email, and other application workloads, but Lane tells El Reg that the company is not compelled to make a four-socket ftServer at this point. Part of the problem is the "eggs in one basket" problem. Even with a fault-tolerant server, customers are hesitant to put all of their workloads onto a bigger box. They still feel more comfortable spreading them out over multiple machines, picking a few machines as database servers, a few for infrastructure, and so forth. Even if customers could virtualize everything and cram it all onto one four-socket machine, they are jumpy about doing so even when the box is fully replicated by lockstepping hardware.
The new ftServer machines can support Red Hat Enterprise Linux 6 in addition to Microsoft Windows Server 2008 R2 and its related Hyper-V R2 hypervisor. The machine cannot run the current VMware ESXi 5.0 hypervisor, but will be certified to run the just-announced ESXi 5.1 hypervisor by the end of the year.
Stratus is in no hurry to support the new Windows Server 2012 operating system, which launched last week. "Fault-tolerant server customers tend to be operating system laggards," explains Lane. "We still have lots of customers who are still using Windows Server 2003." He adds that customers running ftServers tend to wait until the first service pack update to a Linux or Windows operating system before they go into production, and that the planned upgrade to "Ivy Bridge" Xeon E5 machines near the end of 2013 is probably a good time to move to Windows Server 2012 for Stratus.
Stratus did not make configured pricing available for its machines, but says an entry ftServer comes in at around $13,000. ®
Sponsored: Benefits from the lessons learned in HPC