Feeds

The VAX of Life: Sun's cluster guru talks Full Moon

Net Effect™ meets Reg Effect™

  • alert
  • submit to reddit

Top 10 endpoint backup mistakes

Yousef Khalidi, Sun Microsystems Distinguished Engineer, Full Moon chief architect, and Register reader took us on a ride through the history of Sun Clusters 3.0 yesterday. First as might be seen by visiting aliens, and then from a programmer's perspective. Which was nice: plenty of detail gets left out even from the "technical spec sheets" that accompany press launches these days, and so to counter the Net Effect™, here's the RegEffect™.

First of all, Khalidi wanted to clear up the impression that the new offering introduces a new proprietary cluster file system of Sun's own making. It doesn't. Khalidi himself had hinted at this possibility in recent talks, such as this one: "next-generation clusters could rely on a Cluster File System (CFS) to enable global access to all files, devices and network resources in the system, and create a full single-system image."

But that isn't the case. "That was an explicit design decision. The last thing we want to do is invent a new proprietary file system. So if something becomes important - like the new Linux file systems - we can plug it in." Sun Clusters 3.0 indeed employs a "Global File Service", but that boils down to using Solaris' native file system and if you want it - and if you use disks on non-Solaris systems, you almost certainly will - Veritas VxFS. He left door open for Linux file systems to plug-in to Full Moon in the future.

A bicycle made for two... er, eight nodes

So what does Sun Cluster 3.0 most resemble, when viewed from space? For the sake of argument, is it more like Tandem NSC or say, VAX clusters and its spiritual descendents?

"It's both shared-nothing and shared-everything. It's shared-nothing in that the hardware topology can be either. But unlike most everybody else, ours does not require a fully connected SAN," says Khalidi.

However it requires no modifications to the existing applications, which is probably the biggest difference between it and systems based on VMS-ish distributed lock manager (DLM).

"A DLM is several things.. an API. We've been talking to ISVs for five years and these ISVs already write for a DLM for VAX, Oracle has its own lock manager... that's fine. But they don't want another API," he says.

It's a measure of the enduring legacy of the world's first commercial cluster from DEC that Yousuf (and Sun's marketing lead Andy Ingram) referred to it as VAX, or VAX-like throughout. Even though the technology is now called VMSClusters or TruClusters, migrated to Alpha many years ago, and VAX systems are no longer in production.

"To implement locking Sun Cluster 3.0 we use simple Unix APIs. With either the Solaris file system or Veritas' file system." So programmers can map data across instances of the Solaris OS - the standard Unix way of sharing memory across nodes - although this isn't encouraged as an IPC mechanism.

DEC...who?
Nor does Sun Clusters mirror processes in the manner of Himalaya (formerly Tandem) NSC machines. Instead, the system logs file behaviour: open, read, write and sync calls. So this avoids duplication, argues Khiladi, but preserves everything but session state information. And any real transaction will be obeying these semantics, so that's OK. Since the socket calls that Internet applications use follow Unix file semantics (although yes, we know, IP preceded Unix), that ought to be pretty watertight.

That's just one of several fashionable, or once-fashionable approaches to clusters that the Full Moon team was happy to trample in the design work. Another being "process migration", an approach adopted by for example the MOSIX Linux cluster project, which in the event of a node failure, fails over individual processes.

"That was removed from the prototype - on purpose," says Khalidi.

As for Compaq's nest of clusters, "Is that five or seven clustering products they sell?" he asks. Most Q users are still using TruClusters 4x, even though when the analysts last did looked at their scorecards, Compaq's TruCluster 5x was top of the pops. Sun doesn't want to cede that Compaq has any traction in the internet business, and so it's been written out of its own product comparison sheets completely. And it still requires "quirks" as he describes them, such as requiring Q's ADFS to do read-write access.

The design goal was to cluster-enable bog-standard Sun kit and applications that are already in use, over standard interconnects (SCI is on its way), using skills that any BOFH can relate to (such as mounting file systems). Anything special that the competition may boast, goes the line, is a "bragging competition".

Those patents in full...
Forty patents have been applied for says Khalidi. He mentioned process active pair technology (which we know nothing about, but if anyone wants to enlighten us...), mini transaction technology (ditto) and interesting quorum techniques. Quorum is commonly understood in the parallel processing world to be the way a cluster decides who comprises membership of the collective, although Sun's definition is different. Quorum isn't about membership, and Khalidi prefers fencing to quorum for this.

He says that the heartbeat code, another staple of HA cluster planning, actually runs at the highest process priority, rather than being some ad hoc or add-on process, which we thought was interesting. In fact, the toughest problems the team had to crack he said, were around heartbeat issues. Not the "someone's just yanked out the Ethernet cable" but where a node goes quiet because it's under a high-workload.

Trawling around the patent database we found plenty of patents which involved Full Moon 3's precuror's: Khalidi's Solaris MC system in particular. That was a clustered Solaris using CORBA for message passing. Specifically stuff about recovery, clustered file systems and memory mapping. Anyone want to fill in the blanks? ®

Related Stories

Sun goes shoplifting for Christmas Clusters

A new approach to endpoint data protection

More from The Register

next story
Philip K Dick 'Nazi alternate reality' story to be made into TV series
Amazon Studios, Ridley Scott firm to produce The Man in the High Castle
Nintend-OH NO! Sorry, Mario – your profits are in another castle
Red-hatted mascot, red-colored logo, red-stained finance books
Sonos AXES support for Apple's iOS4 and 5
Want to use your iThing? You can't - it's too old
Joe Average isn't worth $10 a year to Mark Zuckerberg
The Social Network deflates the PC resurgence with mobile-only usage prediction
Feel free to BONK on the TUBE, says Transport for London
Plus: Almost NOBODY uses pay-by-bonk on buses - Visa
Twitch rich as Google flicks $1bn hitch switch, claims snitch
Gameplay streaming biz and search king refuse to deny fresh gobble rumors
Stick a 4K in them: Super high-res TVs are DONE
4,000 pixels is niche now... Don't say we didn't warn you
prev story

Whitepapers

7 Elements of Radically Simple OS Migration
Avoid the typical headaches of OS migration during your next project by learning about 7 elements of radically simple OS migration.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Solving today's distributed Big Data backup challenges
Enable IT efficiency and allow a firm to access and reuse corporate information for competitive advantage, ultimately changing business outcomes.
A new approach to endpoint data protection
What is the best way to ensure comprehensive visibility, management, and control of information on both company-owned and employee-owned devices?