Feeds

Sun goes cluster crazy with WildCat

Interconnect tech makes long-awaited debut

  • alert
  • submit to reddit

Internet Security Threat Report 2014

ComputerWire: IT Industry Intelligence

As expected, Sun Microsystems Inc will today roll out its long-awaited "WildCat" high-performance system interconnect technology for its Sun Fire Unix server line,

Timothy Prickett Morgan writes

.

When Sun said it would be able to build UltraSparc servers with hundreds of processors, WildCat was the integral technology desinged to make this a reality. Using Wildcat, or Sun Fire Link as it is now known, Sun can build faster and more resilient commercial clusters as well as bigger HPC clusters than was possible in the past using third party switch technology.

WildCat has been expected in special versions of the Sun Fire servers known as the MaxCats for several years; Sun is now referring to the MaxCats as its "Galaxy-class" servers. The MaxCats are Sun Fire 15000 "StarCat" servers configured with 100 1.05GHz UltraSparc-III+ processors and fitted with the WildCat interconnect. (A regular StarCat has 72 900MHz processors in the main SMP chassis, plus another 34 auxiliary processors that plug into I/O slots for a total of 106 processors.)

We were told some time ago that the WildCat interconnect could link up to eight machines into a single system image, and Sun has confirmed that. When you do the math, a WildCat cluster with eight MaxCat servers delivers about 1.7 teraflops of peak computing power for HPC workloads. This is not as much computing power as IBM Corp, Hewlett Packard Co, SGI Inc, or Cray Inc can deliver in a single system image, but it does get Sun into the upper stratosphere where companies and research institutions want to buy teraflops of capacity.

WildCat, which is a derivative of the Fibre Channel interconnect used to link servers to their peripherals (mainly disk storage) was expected to be delivered in September 2001 alongside the StarCats. Sun has been tweaking and tuning WildCat since that time. That 1.7 teraflops, eight node clustering limit is not one inherent in the WildCat design, says Steve Perrenod, group manager of high performance and technical computing at Sun's Enterprise Systems Products group.

It is rather the limitation of the capacity of the first WildCat switch that Sun has delivered to the market. This Sun Fire Link Switch has its own 6.4GB/sec crossbar switch - very much like the crossbar used in the Sun Fire servers - that has four bi-directional links that provide 4.8GB/sec of peak bandwidth and which have delivered 2.8GB/sec of sustained bandwidth on MaxCat configurations using real-world HPC applications. Perrenod says that the MPI latencies are under 4 microseconds for WildCat, compared to 17.9 microseconds with IBM's current SP2 switch for its Regatta clusters, which has a 1GB/sec bandwidth per channel. MPI, or Message Passing Interface, is the standard interconnect transport for HPC parallel computing.

Sun has said in the past that the Remote Shared Memory API that is at the heart of the WildCat interconnect allows applications to talk directly to that interconnect, bypassing the Solaris operating system on the nodes in a cluster and thereby reducing latencies. The point is, on certain workloads - and exactly what kinds are unclear - Wildfire will apparently present what looks like a single system image to applications, at least as far as latencies are concerned.

WildCat does not work with just any Sun Fire server. Perrenod says that Sun Fire Link is only supported on the 24-way Sun Fire 6800, 36-way Sun Fire 12000, and 72-way Sun Fire 15000 servers. It looks like WildCat requires Sun to unplug some of those CPUs, however, because six processors are removed in the largest MaxCat configuration (eight 100-processor Sun Fire 15000s) and six processors are also removed from the smallest MaxCat (eight 20-way Sun Fire 6800s).

The WildCat implementation on the 6800s is somewhat less sophisticated, says Perrenod, than it is on the 12Ks and 15Ks, which probably explains the pricing differences for WildCat interconnection cards on the machines. Sun Fire Link assemblies for the 6800 cost $56,000 a piece (you need one for each server in the cluster), and on the 12K and 15K machines, the Sun Fire Link assemblies cost over $100,000. The WildCat interconnect can, in theory, be added to the eight-way Sun Fire 3800 and twelve-way Sun Fire 4800 servers, but has not been.

Customers who want to cluster these machines will have to resort to SCI interconnect, the current Sun proprietary system interconnection technology, unless Sun changes its mind. With its competition clustering four-way and eight-way machines, Sun may be forced to do this, especially among HPC customers who are trying to pack as much computing power as possible in the smallest amount of space.

A number of HPC customers have been playing with WildCat for quite some time. In July 2002, the University of Cambridge and Cranfield University in the UK bought a MaxCat configuration with 2 teraflops of computing power employing the WildCat interconnect. This is Sun's largest HPC deal to date, and it is actually composed of three smaller WildCat clusters rather than one big WildCat cluster. A number of other organizations have been testing Wildcat with a collection of Sun Fire 6800 and 15000 servers, including the University of Stuttgart in Germany, the High Performance Computing Virtual Laboratory in Canada, and Aachen University of Technology in Germany.

WildCat's usefulness is not limited solely to the HPC market. With Sun Fire Link being much more capacious and much faster than SCI clustering, customers who have commercial Sun boxes supporting clustered databases - clustered for failover and high availability, not for scalability - will find WildCat appealing.

© Computerwire

Beginner's guide to SSL certificates

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
'Urika': Cray unveils new 1,500-core big data crunching monster
6TB of DRAM, 38TB of SSD flash and 120TB of disk storage
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
SDI wars: WTF is software defined infrastructure?
This time we play for ALL the marbles
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Oracle hires former SAP exec for cloudy push
'We know Larry said cloud was gibberish, and insane, and idiotic, but...'
Symantec backs out of Backup Exec: Plans to can appliance in Jan
Will still provide support to existing customers
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
Win a year’s supply of chocolate
There is no techie angle to this competition so we're not going to pretend there is, but everyone loves chocolate so who cares.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.