The Register® — Biting the hand that feeds IT

Big-time Linux cluster breaks cover

16-node failover cluster seeks solvent, non-smoking application vendor.

Increase your knowledge of the latest threats to your busines

SGI's FailSafe looks like being the first high-availability clustering for Linux to break cover.

The first public demos of the open sourced Linux FailSafe are expected at the LinuxWorld Expo in August, we gather; the binaries were made available on request last week, and should be available on SGI's site this week.

Unlike the current hamper of web server clusters, FailSafe was designed to host database and TP applications using the shared-everything model pioneered by DEC in its original VAXClusters (now TruClusters) - a model adopted by almost everyone else except Tandem and Microsoft.

Largely at the request of SuSE, SGI announced it make the open source available earlier this year, giving a jump start to other long-term and even more ambitious groundwork to create a VAXish high-availability platform for Linux.

According to SuSE's Alan Robertson, maintainer of the Linux-HA Web site and a lead on the FailSafe project, the source code is still undergoing legal scrutiny.

However, as Robertson acknowledges, the initial release of FailSafe marks the beginning rather than the end of business. Unless a clustered file system such as GFS finds its way into the equation, allowing graceful concurrent access to shared disks, FailSafe must use a crude approach to ensuring data integrity: one node simply cuts the power from its contending rival, a technique which uses the delightful acronym STONITH (or, Shoot The Other Node In The Head).

And Linux FailSafe 1.0 is as much a grab for mindshare as a finished article.

The really ambitious long-term, ground-up Linux HA work came to the fore with the short-lived Linux Cluster Cabal last fall, and continues with Stephen Tweedie's HA architecture and Peter Braam's work on a VAXish Distributed Lock Manager and clustered file systems. There's some overlap here: and Robertson says the FailSafe project is keen to ensure interoperability with the erstwhile Cabalites.

Tweedie himself describes FailSafe as "incredibly important" if Linux is to match the highly available commercial Unixes, but points out it doesn't scale beyond 16 nodes, or provide sophisticated load-balancing. Robertson says that he's keen to agree on APIs for cluster services such as quorum and heartbeat that both projects have in common.

There's no doubt that Linux FailSafe looks like a pretty complete package right down to the GUI front end for cluster management, and its 16-node cluster stands up to SCO's NSC clusters, let alone Microsoft's two-node MCS. However it needs the applications, and porting teams at the likes of Oracle, Informix and IBM need to see a durable-looking API before they can propose a business case. With FailSafe, it looks like they've just got one. ®

Related Stories

TurboLinux announces lay-offs and refocus
Linux goes Big Iron

Increase your knowledge of the latest threats to your busines

Don’t Miss

Vulture logo with head phonesWhy Google Wave makes Tim Bray nervous

Radio Reg XML co-author on complexity and the web

Microsoft .NET logoMicrosoft kills Visual Studio's Oracle data connection

Swift reaction: 'Sucks', 'shortsighted'

Opera Software reinvents complete irrelevance

Fail and You Unites browser with self-delusion

Microsoft's Bing feeds you, tries to keep you captive

Review Fully featured Google inertia beater?