Supercomputer niche chucks rocks at Nehalem

SiCortex and the Intel 'problem'

Beginner's guide to SSL certificates

As niche supercomputer-maker SiCortex works on the next generation of its line and watches the IT marketing machine gearing up for Intel's impending Nehalem-based Xeon EP, the company says that Chipzilla isn't moving in the right direction for high-performance computing (HPC) workloads.

Sicortex logo

"The major improvement is in the integration of the memory controller, so you don't have to send all the memory traffic over the northbridge," says Jud Leonard, co-founder and chief architect at SiCortex. "But this only solves one of the three issues that Intel faces - issues that we have already solved. Intel hasn't done anything about solving the performance-per-watt problem, and they haven't done anything in terms of on-chip communication."

That assessment is perhaps a bit harsh, since Intel has in fact moved to successively lower-powered processor designs. The Nehalem chips are an incredible improvement over the old Xeon chips - remember how awful the "Nocona" and "Paxville" Xeons were?

The dual-core Paxville Xeon DP ran at 2.8GHz, had 2MB of L2 cache, 800MT/sec of bandwidth into the front side bus (FSB), and a 135-watt thermal design power (TDP). The Paxville DP chips, which plugged into dual-socket servers in late 2005, supported HyperThreading - each socket provided four threads for operating systems and applications.

The future "Gainestown" Nehalem Xeon EP due on March 31 will run at between 1.86GHz and 3.2GHz with DDR3 main memory running at 800MHz, 1066MHz, or 1333MHz. TDPs will range from 60 to 130 watts.

That top-end, 130-watt part is just stupid unless you have a workload that needs the clocks, so the fair comparison to Paxville Xeon DPs is the 2.93GHz part that has four cores, eight threads, and 8MB of L3 cache. That Xeon EP will handle 6.4GT/sec of memory bandwidth thanks to its integrated memory controllers and the QuickPath Interconnect (QPI) Intel is finally putting into server chips. (Welcome to 2003, Chipzilla.)

When you think about it that way, maybe SiCortex has a point.

But in the last three and a half years, Intel has doubled the Xeon's cores, improved HyperThreading, added lots of L3 cache, boosted memory bandwidth by a factor of eight, and dropped TDPs by 30 per cent. Performance has improved if your application likes more threads, but if it doesn't know how to use those threads the dollars per performance-per-watt calculation isn't impressive. If your applications are memory-constrained, however, you will no doubt see big performance gains from the Xeon EP.

"But here's the problem," says Leonard. "When you get right down to it, these x64 chips were really designed for desktop environments, and they have to hang off an expensive network fabric in supercomputers." As an example, he said that Sun Microsystems' gazillion-port Magnum InfiniBand switch at the heart of its blade-style HPC systems costs more than an entire SiCortex system.

The SiCortex system-on-chip (SoC) nodes were upgraded last October, and sport six MIPS cores running at 700MHz with a built-in a router handling mesh networking of the chips. That on-chip network means you don't have to do any external networking to get the processors to do the communication necessary for HPC workloads - usually MPI but also a fair amount of plain-vanilla TCP/IP.

The October upgrade, when SiCortex's chip fab TSMC moved to a 90nm process, boosted the speed of the chip from 500MHz to 700MHz. A full 5,832 core machine supports 8TB of DDR2 memory running at 667 MHz or 800 MHz, supplying 2.1Tbps of I/O bandwidth over the internal network and 8.1 teraflops of number-crunching power.

Because the speed of the processors is close to the speed of the memory in the SiCortex box, the MIPS cores are not wasting cycles waiting for memory to fill them up with data, which tends to run about half as fast as CPUs in x64 machines these days.

"Our attitude is that HPC applications are constrained by memory access, so you have to spread the application over a large number of cores," says Leonard. But doing so efficiently requires that on-chip network, as far as SiCortex is concerned.

The SiCortex super runs a variant of Gentoo Linux and has a tweaked version of the Lustre open-source clustered file system controlled by Sun and used by many supercomputer centers. Sun bought the company behind the Lustre project in September 2007.

SiCortex bought parallel-compiler maker PathScale in August 2007 and ported the compiler stack to its MIPS-based Gentoo rev. SiCortex also replaced the Linux boot sequence and added system-management tools, but Leonard says that the machine looks and feels like any normal Linux-based Beowulf cluster.

Don't expect SiCortex to suddenly announce a version of its box based on Intel's Nehalem chips. Leonard won't divulge much of the details of the next generation of SiCortex machines, but he did say that they'll be "substantially speeded up," have more cores per die, and take advantage of a "major step" in semiconductor processes from TSMC. The only hint he gave is that SiCortex will exceed a 1GHz clock speed.

Leonard also added that the company has created software to allow multiple SiCortex machines to be daisy-chained to share workloads, but said that code is not yet ready for prime time.

We're just guessing, but SiCortex could move to DDR3 memory and get it and the cores running at 1.33GHz. A process shrink from 90nm to 65nm could allow the clocks to be cranked that high, and maybe the core count could jump from 6 to 10 per die, as well. That would put the SiCortex kicker at about 13.6 teraflops.

There's an even more interesting possibility for future SiCortex HPC machines. With TSMC now partnering with Intel to build Atom x64-based SoCs for embedded applications, it is just faintly possible that SiCortex might move from MIPS to Atom chips, thereby allowing customers to more-easily move their x64 applications to SiCortex iron.

Leonard in no way suggested this was the plan. But it's an interesting possibility, and perhaps a good long-term plan for SiCortex or anyone else who wants to take on the HPC space with low-power parallel machines.

And it is often better to have Chipzilla as an ally than an enemy. ®

Security for virtualized datacentres

More from The Register

next story
It's Big, it's Blue... it's simply FABLESS! IBM's chip-free future
Or why the reversal of globalisation ain't gonna 'appen
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
CAGE MATCH: Microsoft, Dell open co-located bit barns in Oz
Whole new species of XaaS spawning in the antipodes
Microsoft and Dell’s cloud in a box: Instant Azure for the data centre
A less painful way to run Microsoft’s private cloud
AWS pulls desktop-as-a-service from the PC
Support for PCoIP protocol means zero clients can run cloudy desktops
prev story


Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.