Asustek opens curtain on desktop 'supercomputer'
Eee HPC, anyone?
Taiwanese motherboard and PC maker Asustek is apparently getting ready to jump into the personal supercomputer market with a glorified deskside supercomputer that it has developed in conjunction with graphics chip maker Nvidia and the National Chiao Tung University in Taiwan.
Asustek Computer, known by gearheads for its Asus brand motherboards, is probably best known these days to consumers as the originator of the Linux-based netbook, the Eee PC. The ESC1000 personal super was unveiled at an event in Taipei today, but it is not yet ready for sale. The ESC1000 was mischaracterized in the trade press and blogosphere as a supercomputer.
At best, the ESC1000 can be thought of as a beefed up workstation or a baby supercomputer, even though it may have 1.1 teraflops of single-precision number crunching performance. The ESC1000 will not do nearly as well on double-precision floating point math, and hence, it does not get to be called a supercomputer.
According to the story that broke in Computerworld, the Asus ESC1000 personal supercomputer (or Eee HPC, as El Reg is inclined to call such boxes) is based on a single-socket motherboard (presumably from Asustek) that has a single 3.33 GHz Xeon W3580 processor, one of a family of Nehalem EP server chips that were announced in late March, not one of the Xeon 3400 single-socket parts that came out in early September.
This W3580, running at 3.3 GHz and with a TurboBoost speed of 3.6 GHz, is the fastest of these "real" single-socket server chips. (The Lynnfields are glorified desktop chips and are lacking some memory capacity and I/O bandwidth the proper Xeon 3500s have). As you can see from Intel's spec sheet, this chip was introduced in the third quarter of this year and tops out at 24 GB of DDR3 main memory.
Computerworld seems to be the only trade rag that actually got its hand on a spec sheet for the ESC1000 and says that it will sport three Tesla C1060 graphics co-processors as well as a Quadro FX5800 graphics card.
A report in DigiTimes quotes Asustek sources as saying that the ESC1000 will sell at between NT$480,000 and NT$680,000, which is roughly $14,750 to $20,900 in US dollars.
It is not clear where the 1.1 teraflops performance rating that Computerworld pegged the ESC1000 at came from, but it doesn't make a lot of sense. Each Tesla C1060 has 240 cores running at 1.3 GHz, and each is rated at 933 gigaflops for single-precision math and a mere 78 gigaflops for double-precision math. Assuming the Quadro FX 5800 (which also has 240 cores) is being used as a graphics card and you ignore the math processing on the W3580 since it is running application software and the operating system, the Tesla cards and hence the ESC1000 delivers 2.8 teraflops at single precision and 234 gigaflops at double precision - and that is at peak theoretical performance. This is nice for a workstation, but falls a little shy of supercomputer.
The recent baby supers from Silicon Graphics (here and here), Cray (here and here) are at least machines with switches that bear some architectural resemblance to a modern parallel supercomputer. Interestingly, SGI's Octane III personal supercomputer can be equipped with a two-socket Xeon 5500 cookie sheet server and two Tesla C1060 GPUs for doing math.
But that same chassis can be equipped with ten two-socket Xeon 5500 servers and a Gigabit Ethernet or InfiniBand switch, making it also a baby supercomputer cluster. This Octane machine is rated at 726 gigaflops of double-precision math performance, and it has a base price of $7,995 with one Xeon cookie sheet server and a Gigabit Ethernet switch.
Basically, until the Fermi kickers to the Tesla GPUs come out and offer more reasonable double precision math performance, cramming more x64 chips in the box is something that is going to be appealing to certain kinds of work. For other work, a workstation with a few Teslas tossed in will be just fine. But the two are not interchangeable. That's for sure. ®
It is extremely difficult to get 100% performance out of these devices. The Teraflop ratings they are given are for an ideal situation where the alu's are fully occupied across all threads and the max mem bandwith is being achieved.
The class of problems which can get you to this level of performance is rather small...
However for their available processing power they use a substantially lower amount of power than a cluster of cpu's, but are admittedly harder to program efficiently.
You *could* get 1.1 teraflops by adding those two together and dividing by two...
by could, do you mean in another universe where 2 x 1.1 doesn't make 2.2?
Thank you, El Reg
For calling "bullshit" on the performance numbers. Everyone else is trumpeting it as something Los Alamos would want for nuclear sims.