Server makers wrap iron around Nvidia GPUs
How many Teslas can you screw into a chassis?
Supercomputer buyers don't want to spend months building hybrid CPU-GPU clusters. They want to buy them pre-integrated and ready to start flopping within a matter
In the wake of the announcement of the Nvidia Tesla M2090 GPU coprocessor for servers  two weeks ago, Marc Hamilton, vice president of high performance computing at Hewlett-Packard, said in his blog , challenged his HPC team to come up with a pre-integrated rack of servers that would deliver at least 10 teraflops of floating point performance and cost under $100,000.
The GPU Starter Kit, which will be launched at the HP Discover customer and partner shindig in Las Vegas next week, didn't need to use the M2090 fanless GPU coprocessor in the servers to hit the feeds and speeds Hamilton laid out. The starter kit has two of the ProLiant SL6500 tray server chassis, and eight of the ProLiant SL390s G7 2U compute nodes  that slide into the chassis with room for three GPUs and that HP quietly launched in April.
The server nodes each have two Intel Xeon X5675 processors running at 3.06GHz, and across the eight nodes, that works out to a peak of 1.18 teraflops of double-precision floating point processing power. Each node was equipped with three M2070 fanless CPU coprocessors – these run at 1.15GHz and only have 448 out of the possible 512 cores activated – for a total of 12.36 teraflops of oomph at double-precision. That's a combined 13.54 teraflops in a rack across the CPUs and GPUs.
The GPU Starter Kit will come with Red Hat Enterprise Linux preinstalled on the nodes as well as HP's own Cluster Management Utility and Linux Value Pack extensions for HPC customers. The CUDA development environment and runtime will also be slapped onto the machines, too. The rack comes with one DL380 as a control node and a 36-port InfiniBand switch and a 24-port Ethernet switch. You basically turn it on, hook it up to networks and storage, and start running applications in under a day.
HP could make a much denser and more powerful ceepie-geepie machine if it wanted to. The first step would be to move to the M2090 GPU from Nvidia, which runs at a higher clock speed, has more memory bandwidth, and has all 512 cores on the GPU humming to deliver 665 gigaflops of double-precision math each. That yields just under 16 teraflops for 24 GPU coprocessors.
But HP could do better than this by switching to the 4U version of the ProLiant SL390s tray server, which has eight GPUs per two socket server. (There is plenty of room in the rack to do this). By switching to this bigger tray server and by putting in four SL6500 chassis, yields 31.9 teraflops of GPU performance plus the 1.18 teraflops from eight server nodes for a total of 33.1 teraflops of oomph. It is hard to say what HP might charge for this.
Presumably, the GPU Starter Kit will have a variant like the one outlined above, and it would be reasonable to surmise that it would cost somewhere around $150,000 to $175,000 if the setup outlined by Hamilton costs $100,000. (Nvidia does not provide pricing for the M series of Tesla GPUs, so it is hard to say for sure.) Perhaps equally significantly, there is room in the rack to put another eight of the SL390s G7 nodes in the 4U trays and double up the performance again in the rack to 66.2 teraflops for maybe $300,000 to $350,000.
Super Micro wants to ride the ceepie-geepie wave and to sell lots of systems to customers who don't want to pay the IBM, Dell, or HP premium. At the Computex trade show in Taipei, Taiwan this week, Super Micro is showing off two forthcoming CPU-GPU hybrid rack servers that provide slightly more GPU density than the current machines it sells and sport the Nvidia M2090 coprocessor, too.
The SuperServer 1026GT-TRF-FM307 ceepie-geepie
The first new GPU-designed machine is the SuperServer 1026GT-TRF-FM307 is a 1U rack server that has three of the double-wide Tesla M2090 GPU coprocessors crammed into the box, with 20 fans to keep air moving inside the chassis – and not counting the fans in the redundant 1,800 watt power supplies, which are rated at 94 per cent efficiency. The CPU part of the system is based on a Super Micro mobo with two Intel Xeon 5500 or 5600 series processors and using the Intel 5520 chipset. The system board has six SATA ports, and the chassis has room for four hot-swap 2.5-inch SATA drives that mount in ahead of the mobo at the front of the chassis. After plugging in the three Tesla GPUs, there is a single PCI-Express 2.0 x8 slot open for peripheral expansion.
The SuperServer 202GT-TRF-FM407 is a 2U rack server with a single dual-socket motherboard based on Intel's Xeon 5500/5600 processors that runs down the middle of the chassis. That server supports up to 96GB of main memory and has two Gigabit Ethernet ports on the system board for clustering. The unit puts four Tesla GPUs into the chassis, two on the left front and two on the right front, stacked atop each other and the motherboard (made by Super Micro itself) has four PCI-Express 2.0 x16 slots, one for each Tesla GPU.
The unit has room for ten 2.5-inch disk drives, which hot plug into the front of the chassis; the board only has six SATA ports, so four of them are spares. The unit comes with redundant 1,800 watt power supplies that are rated at 94 per cent efficiency.
The SuperServer 202GT-TRF-FM407 ceepie-geepie
It is not clear when these Super Micro CPU-GPU hybrids will be available; the company had not responded for requests for availability and pricing at press time.
Over at IBM, one of the preferred platforms for CPU-GPU hybrid computing is the iDataPlex hybrid, which is a cross between a blade and rack server that comes in a double-wide, half-depth rack that packs up to 84 servers into a single chassis. (The other is the BladeCenter GPU expansion blade  for its HS22 blade servers.)
This week, the iDataPlex dx360 M3 , announced last May, is updated to support Nvidia's Tesla M2090 GPU co-processors. Customers buying iDataPlex dx360 M3 servers will be able to use these M2090 GPUs, which are fanless and rely on the cooling in the server chassis and rack to keep them from melting.
IBM is also, however, going on the cheap and allowing customers to plug Nvidia's Quadro 4000 and 5000 series of graphics cards, which can run the same CUDA software as the Tesla coprocessors, into the iDataPlex dx360 M3 machines. All three GPU options will ship on July 29 and are paired with two-socket servers in the iDataPlex chassis using Xeon 5500 or 5600 series chips. IBM does not provide public pricing for the iDataPlex line. ®