Stratus makes $50,000 zero downtime promise
Guarantees VMware performance
Fault-tolerant server company Stratus is betting real money that its latest range of servers running VMware will never fail - for at least six months.
The company is backing its words with action: it has issued a promise that new and existing Stratus users will experience no unplanned downtime with its new top-of-the-range 2.93 GHz X5570 Xeon-based ftServer 6300 systems running VMware vSphere 4 Enterprise and Enterprise Plus.
Dubbed the zero downtime $50K guarantee, Stratus reckons it will pay the money in cash - or in product credit if you prefer - should unplanned downtime be caused by failure of either the server or virtualisation software during the first six months after being placed into production. The programme is open to all customers and runs until the end of 2010.
It's the first time Stratus has made that promise with VMware, and the cash "effectively covers the cost of the server," according to Stratus consultant Andy Bailey.
"We're seeing phase two of the move towards virtualisation," said Bailey. "That means more and more customers are wanting to run mission-critical enterprise-level databases inside a virtual machine, but haven't dared to so far. This could be the guarantee that helps swing that decision."
Stratus sells its highly fault-tolerant servers to those who need high levels of reassurance, such as those running critical applications in financial institutions, cloud infrastructure providers, healthcare and utilities. The company claims its VMware-toting customers include a global financial services firm, a major US government agency, one of the US's leading credit card processors and a 192-bed acute care hospital with 26 outlying care centres.
Stratus reckons all its servers now operating have delivered 99.99989 per cent uptime - that's under 31.6 seconds downtime in a year - and it knows this because the servers periodically phone home to inform Stratus that they're working as they should.
Stratus servers contains two of everything, from the CPU down, all working in lockstep. Part-owned by Intel among others, Stratus' links with Chipzilla mean that changes were made to server processors at design time to provide links that allow lock-stepping to occur. The Stratus-designed chipsets invoke a failover if a component fails. The server then phones Stratus and tells the support centre what action is needed to fix the problem - it can even order replacement parts automatically. Meanwhile system memory, the OS and applications are shielded, and execution continues as before. ®
The Navy notwithstanding...
Today's CPUs are quite good at telling when they're doing something wrong. There are a lot of 'breadcrumbs' to make the proper determination. If a truly benign divergence happens in which both replicated halves are still machine-correct, then the one that has been running the OS and the applications the longest error-free is kept running. Experience has shown that these occurrences are rare, fortunately.
Stratus used to offer triple redundant servers as well, but field data and long-term analysis showed that the increased availability protection was negligible (in the noise, really) compared to the high increase in cost.
(disclaimer: I'm a design engineer at Stratus Technologies, so I do look at this data quite a bit over morning bagel and OJ) :-)
In a word, no.
Do FRUs tell you which part is wrong? Amazing solution to errors in computing: Just never sell FRUs that make errors!
Re: two of everything.......
Presumably through a unique identifier for each FRU that is picked up from the firmware, handled by the systems management software and reported back to base....................
Do I get a prize for STBO? :)