Original URL: http://www.theregister.co.uk/2012/11/12/amd_firepro_s10000_gpu_card/

AMD fires off crazy-fast FirePro double-whammy GPU card

Aimed at servers and heavy-duty workstations

By Timothy Prickett Morgan

Posted in HPC, 12th November 2012 23:55 GMT

SC12 AMD, a company that knows a thing or two about building powerful graphics processors, has whipped out a card that has more flops than either Nvidia's K20 high-end GPUs or Intel's x86-based Xeon Phi coprocessors.

That's one sure way to cut through all the noise that Nvidia and Intel are making about their just-released offerings: performance.

Like other double-whammy products out there, whether they're CPUs or GPUs, the FirePro S10000 has to gear back the clock speeds on the "Graphics Core Next" GCN) GPUs on the card so it doesn't overheat or cause the server or workstation into which it is lovingly slipped to burst into flame.

FirePro, indeed.

In the Radeon HD 7870 and 7850 cards introduced in March, the GCN GPUs are clocked at 1GHz and 860MHz, respectively. The FirePro S10000 card sports two of AMD's high-end "Tahiti" GPUs spinning at 825MHz.

Each Tahiti GPU has 1,792 stream processors, just like in the single Tahiti GPU used in the existing FirePro S9000 card. But that single-GPU card runs the cores at 900MHz and the GDDR5 graphics memory at 5.5GHz to deliver its 3.23 teraflops of single-precision and 806 gigaflops of double-precision floating point performance.

With the FirePro S10000, not only is the GPU geared down to 825MHz, but the memory is similarly downshifted to 5GHz. The memory interface is 384-bit wide on each GPU, with two blocks of GDDR5 memory yielding a total of 6GB. (This could be a little skinny on the memory for some HPC workloads, given that the S9000 card has 6GB of memory for one Tahiti GPU.) Each GPU can access 240GB/sec of memory bandwidth linking to each 3GB chunk of GDDR5 memory.

Because the card is double-stuffed, it can deliver a very impressive 5.91 teraflops SP and 1.48 teraflops DP in peak floating point oomph.

The FirePro S10000 dual-GPU card

The FirePro S10000 dual-GPU card

By doubling up the GPUs on the card, the thermal design point goes up to 375 watts, compared to 225 watts for the single-GPU S9000, and if you are trying to cram as much floppage as you can get into a server or workstation, this is a good tradeoff.

The S9000 costs $2,499 at list price while the S10000 costs $3,599, and if you do the math the S10000 delivers $6.48 per DP teraflops per watt, while a pair of S9000s will give you $6.89 per DP teraflops per watt.

That's not much of an improvement, mind you, but recall that the S10000 takes up only one x16 slot in a server or workstation, and two S9000s will eat up two slots.

The cost of each SP or DP flops is a bit lower with the S10000, too – to be precise, 21.3 per cent. So for companies wanting to build GPU-goosed clusters or workstations, the S10000 is going to be preferred over the S9000 because it eats up less space and delivers better value.

The S10000 is not just a compute engine for OpenCL applications; it's also a video card if you want to use it for visualization. (Or, for some workloads, you might use a bunch of these for compute and then shift over to visualization when the number-crunching is done.)

The S10000 has one DVI port and four Mini DisplayPorts. The card supports AMD's Eyefinity multiple-display feature, but does not support its CrossFire feature, which allows the ganging up of multiple GPUs to drive a single display two, three, or four times faster if you add that many cards. (If you do that, you either don't have children or you are the coolest dad in the neighborhood.)

You have to have a pretty fat power supply to use an S10000, of course, and the question then becomes how many of these FirePro S10000s can you cram into the workstation or the server before the fire marshall comes around.

AMD says that you need a PCI-Express 2.0 or 3.0 x16 slot to use this card, and a 3.0-generation slot is preferable to get the best performance. That means you need a workstation or a server using the Xeon E3-1200 v2 or Xeon E5 processor from Intel, since these are the only machines that support PCI-Express 3.0 at the moment.

You need a box with a 750 watt or higher power supply with two 150 watt PCI-Express AUX 8-pin power connectors. And with three fans needed to cool the card itself, you might want to have a pretty powerful cooling system for your chassis, as well.

Your server or workstation needs to have 2GB of system memory, and you need to be running Linux or Windows Vista, 7, 8, or Server 2008 R2 SP1.

And yes, it can play Crysis. ®