POWER4 debuts in IBM Regatta
Big Blue's Big Bang eschews SMP numbers game
Despite only a modest name change to IBM's high-end Unix server line, from p680 to p690, IBM's new "Regatta" kit showcases a new generation of Risc chips in the shape of POWER4, and it really is Big Blue's Big Bang.
The chip debuts in the new pSeries at 1.1GHz and 1.3GHz, but clock frequencies aren't so relevant here. The system has been optimised around bandwidth, and boasts a number of firsts for a mainstream chip.
It's the first processor to feature two cores on a die, and the first to eschew the conventional bus, switch or crossbar options in favour of a variable frequency 'distributed switch', or as Big Blue calls it, a "wave-pipelined expansion bus". And the system as a whole feature logical partitioning features familiar to AS/400 and S/390 customers but unusual, if not unprecedented, on a Unix machine. (SGI engineers will probably correct us).
Speeds and feeds
The two cores share 1.5MB of L2 cache and an index into the L3 off-chip cache, and there are four such chips (and therefore eight CPUs) in a physical package. This package is the same four-pin MCM used in the S/390, aka the ZzzzzSeries. IBM says a single MCM is therefore equivalent to six of Sun's four-way boards.
The bus frequency is determined by the chip, at a fixed ratio, so different CPU frequencies dictate the internal bandwidth.
Each chip has three interconnects clocked to 2:1 of whatever the chip's clock speed is set to. So the chips communicate internally (inside the MCM) or with the expansion bus at this ratio: giving a chip-to-chip interconnect frequency of 550MHz or 650MHz, depending on the processor clock speed. This works out as upwards of 35GBps. The processors talk to the L3 cache using a 3:1 ratio, which gives you a bandwidth of upwards of 10GBps.
SGI has been designing its systems to maximise internal bandwidth for years, but all the commercial Unix enterprise vendors have to some extent made practical performance trade-offs.
One interesting side effect of having two cores sharing one cache is that for certain HPC workloads, such as fluid dynamics, it's better to turn one of the cores off, we learned.
The other approach to increasing instruction level parallelism, SMT or simulataneous multi-threading, involves creating a 'virtual' processors, but retaining only one CPU. Aside from what Dr Burton Smith (credited with pioneering SMT) calls "exotic hardware" - Cray CDC 6600, MARS-M and his own Delencor machines - this technique hasn't been used in mainstream hardware. Although that's likely to change with Intel's Xeon MP next year, and Sun's SPARC V in 2003.
I/O, I/O, it's off to work we go
IBM has snagged the zSeries' power supply and expansion units for Regatta. There's room for up to eight I/O drawers, each with room for 16 disks (IBM specifies 18.2GB or 36.4GB SCSI) giving up to 580GB per drawer.
IBM made much of its fault-tolerant features, without mentioning the word. The LPAR partitioning, which the AS/400 received a couple of years ago, comes to IBM's Unix line, and allows up to 16 concurrent system images. Individual chips and caches can be turned off on the fly. Probably as important is that the system can be administered as a single system image, a long-standing feature of Tandem and DEC mid-range systems, which will help manage native Linux sessions.
Less is more?
Fujitsu-Siemens currently leads the TPC-C benchmarks with a 128-way box, and at its StarCat launch last week Sun announced an option that would let you run 106 CPUs in one SMP.
"They've got lower-powered chips," Jon Barnes, IBM's EMEA chief for the pSeries, told us, "so they need to put a lot of them together."
IBM published SPECint2000 figures of 808 for the POWER4 against 467 for sun's UltraSPARC III, SPECfp2000 of 1169 (against USIII's 482) and SPEC's Java benchmark of 169,000 for a 16-way POWER4 against 109,146 for a 24-way Sun US III box.
So the Regatta sails out like this:
There basic model is an eight-way 1.1GHz with 8GB RAM and two 18.2GB disks, with a list price of $450,000. The 16-way with 16GB is listed at $761,878. The 1.3GHz POWER4 is only available in the 'Turbo' system, while there's a third option targeted at HPC users. IBM boasted that POWER4 was shipping on schedule (it becomes available in December), and for once, made no sideswipe about UltraSPARC III... ®