Supercomputer niche chucks rocks at Nehalem

SiCortex and the Intel 'problem'

Combat fraud and increase customer satisfaction

As niche supercomputer-maker SiCortex works on the next generation of its line and watches the IT marketing machine gearing up for Intel's impending Nehalem-based Xeon EP, the company says that Chipzilla isn't moving in the right direction for high-performance computing (HPC) workloads.

Sicortex logo

"The major improvement is in the integration of the memory controller, so you don't have to send all the memory traffic over the northbridge," says Jud Leonard, co-founder and chief architect at SiCortex. "But this only solves one of the three issues that Intel faces - issues that we have already solved. Intel hasn't done anything about solving the performance-per-watt problem, and they haven't done anything in terms of on-chip communication."

That assessment is perhaps a bit harsh, since Intel has in fact moved to successively lower-powered processor designs. The Nehalem chips are an incredible improvement over the old Xeon chips - remember how awful the "Nocona" and "Paxville" Xeons were?

The dual-core Paxville Xeon DP ran at 2.8GHz, had 2MB of L2 cache, 800MT/sec of bandwidth into the front side bus (FSB), and a 135-watt thermal design power (TDP). The Paxville DP chips, which plugged into dual-socket servers in late 2005, supported HyperThreading - each socket provided four threads for operating systems and applications.

The future "Gainestown" Nehalem Xeon EP due on March 31 will run at between 1.86GHz and 3.2GHz with DDR3 main memory running at 800MHz, 1066MHz, or 1333MHz. TDPs will range from 60 to 130 watts.

That top-end, 130-watt part is just stupid unless you have a workload that needs the clocks, so the fair comparison to Paxville Xeon DPs is the 2.93GHz part that has four cores, eight threads, and 8MB of L3 cache. That Xeon EP will handle 6.4GT/sec of memory bandwidth thanks to its integrated memory controllers and the QuickPath Interconnect (QPI) Intel is finally putting into server chips. (Welcome to 2003, Chipzilla.)

When you think about it that way, maybe SiCortex has a point.

But in the last three and a half years, Intel has doubled the Xeon's cores, improved HyperThreading, added lots of L3 cache, boosted memory bandwidth by a factor of eight, and dropped TDPs by 30 per cent. Performance has improved if your application likes more threads, but if it doesn't know how to use those threads the dollars per performance-per-watt calculation isn't impressive. If your applications are memory-constrained, however, you will no doubt see big performance gains from the Xeon EP.

"But here's the problem," says Leonard. "When you get right down to it, these x64 chips were really designed for desktop environments, and they have to hang off an expensive network fabric in supercomputers." As an example, he said that Sun Microsystems' gazillion-port Magnum InfiniBand switch at the heart of its blade-style HPC systems costs more than an entire SiCortex system.

The SiCortex system-on-chip (SoC) nodes were upgraded last October, and sport six MIPS cores running at 700MHz with a built-in a router handling mesh networking of the chips. That on-chip network means you don't have to do any external networking to get the processors to do the communication necessary for HPC workloads - usually MPI but also a fair amount of plain-vanilla TCP/IP.

The October upgrade, when SiCortex's chip fab TSMC moved to a 90nm process, boosted the speed of the chip from 500MHz to 700MHz. A full 5,832 core machine supports 8TB of DDR2 memory running at 667 MHz or 800 MHz, supplying 2.1Tbps of I/O bandwidth over the internal network and 8.1 teraflops of number-crunching power.

Because the speed of the processors is close to the speed of the memory in the SiCortex box, the MIPS cores are not wasting cycles waiting for memory to fill them up with data, which tends to run about half as fast as CPUs in x64 machines these days.

"Our attitude is that HPC applications are constrained by memory access, so you have to spread the application over a large number of cores," says Leonard. But doing so efficiently requires that on-chip network, as far as SiCortex is concerned.

The SiCortex super runs a variant of Gentoo Linux and has a tweaked version of the Lustre open-source clustered file system controlled by Sun and used by many supercomputer centers. Sun bought the company behind the Lustre project in September 2007.

SiCortex bought parallel-compiler maker PathScale in August 2007 and ported the compiler stack to its MIPS-based Gentoo rev. SiCortex also replaced the Linux boot sequence and added system-management tools, but Leonard says that the machine looks and feels like any normal Linux-based Beowulf cluster.

Don't expect SiCortex to suddenly announce a version of its box based on Intel's Nehalem chips. Leonard won't divulge much of the details of the next generation of SiCortex machines, but he did say that they'll be "substantially speeded up," have more cores per die, and take advantage of a "major step" in semiconductor processes from TSMC. The only hint he gave is that SiCortex will exceed a 1GHz clock speed.

Leonard also added that the company has created software to allow multiple SiCortex machines to be daisy-chained to share workloads, but said that code is not yet ready for prime time.

We're just guessing, but SiCortex could move to DDR3 memory and get it and the cores running at 1.33GHz. A process shrink from 90nm to 65nm could allow the clocks to be cranked that high, and maybe the core count could jump from 6 to 10 per die, as well. That would put the SiCortex kicker at about 13.6 teraflops.

There's an even more interesting possibility for future SiCortex HPC machines. With TSMC now partnering with Intel to build Atom x64-based SoCs for embedded applications, it is just faintly possible that SiCortex might move from MIPS to Atom chips, thereby allowing customers to more-easily move their x64 applications to SiCortex iron.

Leonard in no way suggested this was the plan. But it's an interesting possibility, and perhaps a good long-term plan for SiCortex or anyone else who wants to take on the HPC space with low-power parallel machines.

And it is often better to have Chipzilla as an ally than an enemy. ®

3 Big data security analytics techniques

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Kingston DataTraveler MicroDuo: Turn your phone into a 72GB beast
USB-usiness in the front, micro-USB party in the back
IBM rides nightmarish hardware landscape on OpenPOWER Consortium raft
Google mulls 'third-generation of warehouse-scale computing' on Big Blue's open chips
It's GOOD to get RAIN on your upgrade parade: Crucial M550 1TB SSD
Performance tweaks and power savings – what's not to like?
AMD's 'Seattle' 64-bit ARM server chips now sampling, set to launch in late 2014
But they won't appear in SeaMicro Fabric Compute Systems anytime soon
prev story


Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.