Feeds

The VAX of Life: Sun's cluster guru talks Full Moon

Net Effect™ meets Reg Effect™

  • alert
  • submit to reddit

The essential guide to IT transformation

Yousef Khalidi, Sun Microsystems Distinguished Engineer, Full Moon chief architect, and Register reader took us on a ride through the history of Sun Clusters 3.0 yesterday. First as might be seen by visiting aliens, and then from a programmer's perspective. Which was nice: plenty of detail gets left out even from the "technical spec sheets" that accompany press launches these days, and so to counter the Net Effect™, here's the RegEffect™.

First of all, Khalidi wanted to clear up the impression that the new offering introduces a new proprietary cluster file system of Sun's own making. It doesn't. Khalidi himself had hinted at this possibility in recent talks, such as this one: "next-generation clusters could rely on a Cluster File System (CFS) to enable global access to all files, devices and network resources in the system, and create a full single-system image."

But that isn't the case. "That was an explicit design decision. The last thing we want to do is invent a new proprietary file system. So if something becomes important - like the new Linux file systems - we can plug it in." Sun Clusters 3.0 indeed employs a "Global File Service", but that boils down to using Solaris' native file system and if you want it - and if you use disks on non-Solaris systems, you almost certainly will - Veritas VxFS. He left door open for Linux file systems to plug-in to Full Moon in the future.

A bicycle made for two... er, eight nodes

So what does Sun Cluster 3.0 most resemble, when viewed from space? For the sake of argument, is it more like Tandem NSC or say, VAX clusters and its spiritual descendents?

"It's both shared-nothing and shared-everything. It's shared-nothing in that the hardware topology can be either. But unlike most everybody else, ours does not require a fully connected SAN," says Khalidi.

However it requires no modifications to the existing applications, which is probably the biggest difference between it and systems based on VMS-ish distributed lock manager (DLM).

"A DLM is several things.. an API. We've been talking to ISVs for five years and these ISVs already write for a DLM for VAX, Oracle has its own lock manager... that's fine. But they don't want another API," he says.

It's a measure of the enduring legacy of the world's first commercial cluster from DEC that Yousuf (and Sun's marketing lead Andy Ingram) referred to it as VAX, or VAX-like throughout. Even though the technology is now called VMSClusters or TruClusters, migrated to Alpha many years ago, and VAX systems are no longer in production.

"To implement locking Sun Cluster 3.0 we use simple Unix APIs. With either the Solaris file system or Veritas' file system." So programmers can map data across instances of the Solaris OS - the standard Unix way of sharing memory across nodes - although this isn't encouraged as an IPC mechanism.

DEC...who?
Nor does Sun Clusters mirror processes in the manner of Himalaya (formerly Tandem) NSC machines. Instead, the system logs file behaviour: open, read, write and sync calls. So this avoids duplication, argues Khiladi, but preserves everything but session state information. And any real transaction will be obeying these semantics, so that's OK. Since the socket calls that Internet applications use follow Unix file semantics (although yes, we know, IP preceded Unix), that ought to be pretty watertight.

That's just one of several fashionable, or once-fashionable approaches to clusters that the Full Moon team was happy to trample in the design work. Another being "process migration", an approach adopted by for example the MOSIX Linux cluster project, which in the event of a node failure, fails over individual processes.

"That was removed from the prototype - on purpose," says Khalidi.

As for Compaq's nest of clusters, "Is that five or seven clustering products they sell?" he asks. Most Q users are still using TruClusters 4x, even though when the analysts last did looked at their scorecards, Compaq's TruCluster 5x was top of the pops. Sun doesn't want to cede that Compaq has any traction in the internet business, and so it's been written out of its own product comparison sheets completely. And it still requires "quirks" as he describes them, such as requiring Q's ADFS to do read-write access.

The design goal was to cluster-enable bog-standard Sun kit and applications that are already in use, over standard interconnects (SCI is on its way), using skills that any BOFH can relate to (such as mounting file systems). Anything special that the competition may boast, goes the line, is a "bragging competition".

Those patents in full...
Forty patents have been applied for says Khalidi. He mentioned process active pair technology (which we know nothing about, but if anyone wants to enlighten us...), mini transaction technology (ditto) and interesting quorum techniques. Quorum is commonly understood in the parallel processing world to be the way a cluster decides who comprises membership of the collective, although Sun's definition is different. Quorum isn't about membership, and Khalidi prefers fencing to quorum for this.

He says that the heartbeat code, another staple of HA cluster planning, actually runs at the highest process priority, rather than being some ad hoc or add-on process, which we thought was interesting. In fact, the toughest problems the team had to crack he said, were around heartbeat issues. Not the "someone's just yanked out the Ethernet cable" but where a node goes quiet because it's under a high-workload.

Trawling around the patent database we found plenty of patents which involved Full Moon 3's precuror's: Khalidi's Solaris MC system in particular. That was a clustered Solaris using CORBA for message passing. Specifically stuff about recovery, clustered file systems and memory mapping. Anyone want to fill in the blanks? ®

Related Stories

Sun goes shoplifting for Christmas Clusters

Secure remote control for conventional and virtual desktops

More from The Register

next story
6 Obvious Reasons Why Facebook Will Ban This Article (Thank God)
Clampdown on clickbait ... and El Reg is OK with this
Kaspersky backpedals on 'done nothing wrong, nothing to fear' blather
Founder (and internet passport fan) now says privacy is precious
TROLL SLAYER Google grabs $1.3 MEEELLION in patent counter-suit
Chocolate Factory hits back at firm for suing customers
Facebook, Google and Instagram 'worse than drugs' says Miley Cyrus
Italian boffins agree with popette's theory that haters are the real wrecking balls
Mozilla's 'Tiles' ads debut in new Firefox nightlies
You can try turning them off and on again
Sit tight, fanbois. Apple's '$400' wearable release slips into early 2015
Sources: time to put in plenty of clock-watching for' iWatch
Facebook to let stalkers unearth buried posts with mobe search
Prepare to HAUNT your pal's back catalogue
Ex-IBM CEO John Akers dies at 79
An era disrupted by the advent of the PC
prev story

Whitepapers

Endpoint data privacy in the cloud is easier than you think
Innovations in encryption and storage resolve issues of data privacy and key requirements for companies to look for in a solution.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Advanced data protection for your virtualized environments
Find a natural fit for optimizing protection for the often resource-constrained data protection process found in virtual environments.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.