Nvidia's Fermi hits flop-hungry challengers
HPC players tool up
Nvidia's Fermi graphics coprocessors have begun shipping through its OEM partner channel with a slew of tier-two players hoping the flop-happy GPUs give them a competitive edge against established players in the HPC server racket.
The Fermi graphics cards and GPU coprocessors that are based on them were both previewed last November at the SC09 supercomputing conference. The Fermi graphics chips previewed had 512 cores, but for reasons that Nvidia has not explained - and which probably involve chip yields and heating issues - the GeForce graphics cards and Tesla 20 coprocessors that have started shipping only have 448 working cores. And that means their floating-point performance is a little lower than expected.
The Tesla coprocessors are implemented in three different form factors, which was not apparent at the launch last November. The C series GPU coprocessors have fans on them and plug into workstation and personal supercomputers (basically, an x64 workstation on steroids); the M series, which are fanless units that are intended to be used in hybrid CPU-GPU setups within the same chassis; and the S series, which are GPU appliances that plug into servers through external PCI Express links and pack up to four GPUs into a 1U chassis.
Back in November, Nvidia was saying that the C2050 and the C2070, which had an initial rating of 520 and 630 gigaflops doing double-precision math and which cost $2,499 and $3,999, respectively, would support the 512-core Fermi chips. In early April, Nvidia started shipping the C2050, but with only 448 cores and rated at 515 gigaflops double-precision, and the C2070 was pushed out to the third quarter. It's a fair guess that with the number of cores dropping by 12.5 per cent in the C2050 but the aggregate performance of the GPU coprocessor only dropping by one per cent, Nvidia cranked up the clock speed to make up for the lower GPU core count.
There were to be two variations of the S series GPU appliances, the S2050 appliance using the C2050 GPUs, rated at 2.08 teraflops and costing $12,995, and the S2070 appliance using the faster C2070 GPUs rated at 2.52 teraflops and costing $18,995. The S series boxes aren't shipping yet, and they will be based on the 448-core C series GPUs, likely providing a little less floppy oomph. Sources at Nvidia say that the S series GPU appliances are still on track for delivery this quarter.
Nvidia started peddling the Fermi GPUs in its GeForce graphics card lineup during the first quarter.
The news today is that the Tesla M2050 embedded GPU coprocessor, which is based on the C2050 card as the name suggests and which is rated at the same 515 gigaflops of double-precision and 1.03 teraflops single-precision floating point performance, has begun shipping through OEM server partners. Appro and Super Micro were the first to announce systems using the M series GPUs. (You have to hunt around the Nvidia site to find the M2050 spec sheet, so let me save you the trouble.)
Oak Ridge boys
Nvidia planned to host a big shindig in Washington DC kicking off the M series, with Oak Ridge National Laboratory talking about how hybrid CPU-GPU systems were the wave of the future, and Georgia Tech, which has a project called Keeneland for creating applications that run on the hybrid CPU-GPU, giving presentations.
Oak Ridge is, of course, one of the first big customers for the Fermi GPUs. Last October, before the Fermi GPU coprocessors were unveiled by Nvidia at SC09 but after the Fermi chips on which they are based were detailed, the Cray XT "Jaguar" massively parallel Opteron super at Oak Ridge weighed in at 1.06 petaflops using the Linpack Fortran benchmark test as a gauge. Shortly thereafter, the upgraded Jaguar machine was pushed to 1.76 petaflops by the addition of new Opteron cores.
The only reason this matters is that in early October last year, Oak Ridge said that it would be building a hybrid CPU-GPU super based on Nvidia cards that would have at least ten times the oomph of Jaguar. Most likely meaning breaking the 10 petaflops barrier, but not the 20 petaflops barrier. Oak Ridge was intentionally vague, and perhaps because it was unsure of what the performance of such a hybrid machine might be.
There is also a rumor going around that Oak Ridge was unhappy about the performance of the Nvidia Tesla 20 GPUs and has canceled the project, but Nvidia says this is untrue. Oak Ridge has yet to say exactly what it is building.