Feeds

Sun goes cluster crazy with WildCat

Interconnect tech makes long-awaited debut

  • alert
  • submit to reddit

Maximizing your infrastructure through virtualization

ComputerWire: IT Industry Intelligence

As expected, Sun Microsystems Inc will today roll out its long-awaited "WildCat" high-performance system interconnect technology for its Sun Fire Unix server line,

Timothy Prickett Morgan writes

.

When Sun said it would be able to build UltraSparc servers with hundreds of processors, WildCat was the integral technology desinged to make this a reality. Using Wildcat, or Sun Fire Link as it is now known, Sun can build faster and more resilient commercial clusters as well as bigger HPC clusters than was possible in the past using third party switch technology.

WildCat has been expected in special versions of the Sun Fire servers known as the MaxCats for several years; Sun is now referring to the MaxCats as its "Galaxy-class" servers. The MaxCats are Sun Fire 15000 "StarCat" servers configured with 100 1.05GHz UltraSparc-III+ processors and fitted with the WildCat interconnect. (A regular StarCat has 72 900MHz processors in the main SMP chassis, plus another 34 auxiliary processors that plug into I/O slots for a total of 106 processors.)

We were told some time ago that the WildCat interconnect could link up to eight machines into a single system image, and Sun has confirmed that. When you do the math, a WildCat cluster with eight MaxCat servers delivers about 1.7 teraflops of peak computing power for HPC workloads. This is not as much computing power as IBM Corp, Hewlett Packard Co, SGI Inc, or Cray Inc can deliver in a single system image, but it does get Sun into the upper stratosphere where companies and research institutions want to buy teraflops of capacity.

WildCat, which is a derivative of the Fibre Channel interconnect used to link servers to their peripherals (mainly disk storage) was expected to be delivered in September 2001 alongside the StarCats. Sun has been tweaking and tuning WildCat since that time. That 1.7 teraflops, eight node clustering limit is not one inherent in the WildCat design, says Steve Perrenod, group manager of high performance and technical computing at Sun's Enterprise Systems Products group.

It is rather the limitation of the capacity of the first WildCat switch that Sun has delivered to the market. This Sun Fire Link Switch has its own 6.4GB/sec crossbar switch - very much like the crossbar used in the Sun Fire servers - that has four bi-directional links that provide 4.8GB/sec of peak bandwidth and which have delivered 2.8GB/sec of sustained bandwidth on MaxCat configurations using real-world HPC applications. Perrenod says that the MPI latencies are under 4 microseconds for WildCat, compared to 17.9 microseconds with IBM's current SP2 switch for its Regatta clusters, which has a 1GB/sec bandwidth per channel. MPI, or Message Passing Interface, is the standard interconnect transport for HPC parallel computing.

Sun has said in the past that the Remote Shared Memory API that is at the heart of the WildCat interconnect allows applications to talk directly to that interconnect, bypassing the Solaris operating system on the nodes in a cluster and thereby reducing latencies. The point is, on certain workloads - and exactly what kinds are unclear - Wildfire will apparently present what looks like a single system image to applications, at least as far as latencies are concerned.

WildCat does not work with just any Sun Fire server. Perrenod says that Sun Fire Link is only supported on the 24-way Sun Fire 6800, 36-way Sun Fire 12000, and 72-way Sun Fire 15000 servers. It looks like WildCat requires Sun to unplug some of those CPUs, however, because six processors are removed in the largest MaxCat configuration (eight 100-processor Sun Fire 15000s) and six processors are also removed from the smallest MaxCat (eight 20-way Sun Fire 6800s).

The WildCat implementation on the 6800s is somewhat less sophisticated, says Perrenod, than it is on the 12Ks and 15Ks, which probably explains the pricing differences for WildCat interconnection cards on the machines. Sun Fire Link assemblies for the 6800 cost $56,000 a piece (you need one for each server in the cluster), and on the 12K and 15K machines, the Sun Fire Link assemblies cost over $100,000. The WildCat interconnect can, in theory, be added to the eight-way Sun Fire 3800 and twelve-way Sun Fire 4800 servers, but has not been.

Customers who want to cluster these machines will have to resort to SCI interconnect, the current Sun proprietary system interconnection technology, unless Sun changes its mind. With its competition clustering four-way and eight-way machines, Sun may be forced to do this, especially among HPC customers who are trying to pack as much computing power as possible in the smallest amount of space.

A number of HPC customers have been playing with WildCat for quite some time. In July 2002, the University of Cambridge and Cranfield University in the UK bought a MaxCat configuration with 2 teraflops of computing power employing the WildCat interconnect. This is Sun's largest HPC deal to date, and it is actually composed of three smaller WildCat clusters rather than one big WildCat cluster. A number of other organizations have been testing Wildcat with a collection of Sun Fire 6800 and 15000 servers, including the University of Stuttgart in Germany, the High Performance Computing Virtual Laboratory in Canada, and Aachen University of Technology in Germany.

WildCat's usefulness is not limited solely to the HPC market. With Sun Fire Link being much more capacious and much faster than SCI clustering, customers who have commercial Sun boxes supporting clustered databases - clustered for failover and high availability, not for scalability - will find WildCat appealing.

© Computerwire

The Power of One eBook: Top reasons to choose HP BladeSystem

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
VVOL update: Are any vendors NOT leaping into bed with VMware?
It's not yet been released but everyone thinks it's the dog's danglies
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
The triumph of VVOL: Everyone's jumping into bed with VMware
'Bandwagon'? Yes, we're on it and so what, say big dogs
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Application security programs and practises
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Securing Web Applications Made Simple and Scalable
Learn how automated security testing can provide a simple and scalable way to protect your web applications.