AMD muscles Nvidia with fanless GPU coprocessors
Anything you can do, we can do. Except ECC
Keeping pace with Nvidia in the GPU wars, Advanced Micro Devices has not only launched its "Lisbon" Opteron 4100 processors but also released the embedded versions of its "Cypress" family of GPUs, a counterpunch to Nvidia's "Fermi" chips used in its Tesla embedded GPUs.
The Cypress GPUs already made their way into the ATI Radeon HD 5870 discrete graphics cards (last October  and the ATI FirePro V8800 graphics cards for high-end workstations (back in April ). Today, the Cypress GPUs will be plunked into the third generation of FireStream GPU coprocessors intended for embedded applications where the GPUs do complex math that an x64 can't do without both taking its shoes off and pulling its pants down (if it is male) or lifting its shirt up (if it is female).
The Cypress GPU is no slouch, just like Nvidia's Fermi GPUs — and just like Intel and AMD are fierce competitors that get the best of each other every now and again, the competition between AMD and Nvidia drives innovation forward. The Cypress GPU gets the normal fan-cooled packaging for the Radeon HD and FirePro discrete graphics cards, with the major difference being that the FirePro cards has more video memory. With the FireStream GPU co-processors, the units are equipped with a passive heat sink that allows them to slide into rack and tower servers, creating the hybrid x64-GPU systems that many think will soon become the norm in the HPC arena.
Here's the block diagram laying out the Cypress GPU components:
The Cypress chip has 1,600 SIMD engines and a slew of supporting electronics wrapped around them so they can do math with their clothing still intact. The AMD GPU has full support for the DirectCompute 11 and OpenCL 1.0 graphics and number-crunching protocols embedded in its hardware, and also includes 32-bit atomic operations, flexible 32KB local data shares, 64KB global data shares, global synchronization, and append/consume buffers etched onto its silicon.
With all of its cores working properly, the Cypress GPU can deliver 2.72 teraflops of single-precision and 544 gigaflops of double-precision floating point performance. While there are some workloads that can use single-precision just fine (some life sciences and oil and gas exploration apps are fine with single precision), most flop heads care about double-precision. And in this case, the ATI Cypress GPU can hold its own against the best Fermi that Nvidia has. However, Nvidia makes much about the fact that the ATI GPU does not have error correction on its cores and GDDR memory — and AMD acknowledges that's a feature it needs to add.
Double-precision math is more interesting to a lot of organizations looking to do more flops. The first FireStream embedded GPUs, from October 2006, were glorified Radeon X19XX GPUs with only single-precision math. The FireStream 9170s hit 500 single-precision gigaflops and added double-precision math — albeit substantially less than you might expect.
In the summer of 2008, ATI kicked out the FireStream 9250 (1 teraflops SP and 200 gigaflops DP) and 9270 (1.2 teraflops SP and 240 gigaflops SP) embedded GPUs. The 9250s were single-slot devices with 1GB of GDDR3 graphics memory rated at under 120 watts, while the 9270s were double-slotters with 2GB of faster GDDR5 memory rated at 160 watts. These units have fans, which screw up the airflow inside of servers and therefore limited their ability to be adopted in HPC clusters. That's why both Nvidia and AMD are going with passive heat sinks with their latest embedded GPUs.
The new entry-level embedded AMD GPU, the FireStream 9350, is the one to go for if you're looking for the best way to put the most flops in a box. With 2GB of GDDR5 graphics memory, 2 teraflops SP and 400 gigaflops DP performance, it is basically twice the GPU of its predecessor, the FireStream 9250. The FireStream 9350 has 1,440 of its SIMD engine cores working — presumably the other 160 are duds — and runs at 700MHz with a memory clock of 1GHz.
The AMD FireStream 9350 Embedded GPU
At 150 watts, the 9350 embedded GPU runs a little hotter than its predecessor, but an extra 30 watts or so to double the performance is a very good Moore's Law trade-off. And equally importantly, the FireStream 9350, at $799, is cheaper than the 9250 GPU, which cost $999. A teraflops of the FireStream 9250 cards would run you just under $5,000, and with the 9350 GPUs, you're talking just under $2,000 per teraflops.
Some applications need more graphics memory, and some customers want as much DP number crunching as they can get in each device and are willing to sacrifice two slots in their server to get it. That's what the top-end FireStreams are all about. The FireStream 9370 embedded GPU has all of its 1,600 SIMD engine cores working, runs at 825MHz with a memory clock of 1.15GHz, includes 4 GB of GDDR5 graphics memory, and is rated at 2.64 teraflops SP and 528 gigaflops DP. This is twice the memory and more than twice the performance of the FireStream 9270 it replaces, although it took a lot more heat (225 watts compared to 160 watts) to get there.
The AMD FireStream 9370 Double-Wide Embedded GPU
The FireStream 9370 costs $1,999, just like the 9270 did, although a little while after the 9270 was out, AMD cut the price to $1,499 to better compete against Nvidia. When you look at double-precision math only, the 9370 is still the better deal by far, at $3,786 per teraflops compared to the original 9370 price (which works out to $8,329 per teraflops) or the reduced price (6,246 per teraflops).
You can see that AMD knows it needs to meet or beat Nvidia's Fermi GPUs for embedded co-processors in HPC servers. Nvidia's M2050 and M2070 GPU coprocessors, which debuted  in early May, are rated at 1.03 teraflops SP and 515 gigaflops DP, and throw off the same 225 watts of heat as the top-end FireStream based on the Cypress GPU. Pricing is not available for this unit from Nvidia, so it's hard to make any comparisons on flops-per-buck.
AMD expects the new FireStream embedded GPUs to start shipping in the third quarter, and Patricia Harrell, director of stream computing at the company, says the target is sometime in August. ®