Original URL: http://www.theregister.co.uk/2010/11/17/convey_computer_upgrade/

Convey upgrades hybrid core supercomputer

Drop-in FPGA upgrade

By Timothy Prickett Morgan

Posted in HPC, 17th November 2010 05:00 GMT

SC10 Not all of the talk at the SC10 supercomputing extravaganza in New Orleans is about GPU co-processors. Convey Computer – which burst into the HPC scene two years ago with a hybrid supercomputer that employs x64 processors and field programmable gate array (FPGA) accelerators – is upgrading its HC-1 supercomputer with a new generation of FPGAs.

Convey Computer's Hybrid Core-1 super is not all the rage like hybrid CPU-GPU machines, but for certain applications, a more expensive FPGA makes more sense as an accelerator because it can be programmed to have different personalities and to accelerate very specific algorithms. The original HC-1 runs a Linux software stack (of course), and put a Xilinx Virtex-5 FPGA inside of an old Xeon socket using the frontside bus architecture from the Xeon 5300 generation.

Convey has licensed both the frontside bus and the newer QuickPath Interconnect used in more modern Xeon 5500, 5600, 6500, and 7500 processors from the chip giant. But it has not yet implemented a QPI version of the machine. Bruce Toal, the company's co-founder and chief executive officer, said "it is getting close."

With the HC-1ex supercomputer announced at SC10, Convey is upgrading to the latest Virtex-6 FPGA, which offers nearly four times the elements of the prior generation of FPGAs and is yielding somewhere between two and three times more performance running certain algorithms compared to the HC-1 that started shipping in volume in the middle of last year.

The HC-1 and now the HC-1ex basically take a two-socket Xeon server and turn one of those sockets into a math co-processor for the other socket. (You can get the full details of the HC-1 architecture here in our coverage from a year ago.) The innovation that Convey has done with its design is not pairing an FPGA to CPU, but doing so using a custom motherboard that allows the two devices to have a cache-coherent shared virtual memory space, and with the integrated programming environment that Convey has created, applications see the x64 instruction set and a set of co-processor instructions implemented in the FPGA's personality. Programmers using standard C, C++, and Fortran compilers see the extra instructions implemented in the FPGA and can make use of them in their code.

The FPGA chip has 16 memory channels reaching out to the system, providing 80 GB/sec of bandwidth into the FPGA. The HC-1 system boards have four DIMM channels for the single x64 processor and 16 DIMM channels for the FPGA, which are linked to each other through the front side bus architecture - just like Xeon processors are hooked to each other in a two-socket machine. The HC-1 uses standard DIMMs that have been optimized for cache line transfers (sequential access) and also has a special set of scatter-gather DIMMs (SG-DIMMs) that are optimized for 8-byte transfers (random access). The Xeon side of the original system can support up to 32 GB of main memory for applications using 8 GB DIMMs, while the co-processor side of the system board can support up to 128 GB of memory.

Xeon times four

With the HC-1ex, the Xeon side of the machine has been upgraded to a quad-core Xeon 5400 processor running at 2.13 GHz and its memory has been boosted to 128 GB. The HC-1 has also been upgraded to 128 GB of memory for the CPU and has four Xilinx Virtex 5 LX330 FPGAs packaged up to fit into a single socket, while the HC-1ex has four Virtex 6 LX760 FPGAs in a socket. Both machines now support 128 GB of standard DDR2 main memory and 64 GB of SG-DIMMs. The HC-1 comes in a 2U chassis, while the HC-1ex comes in a 3U chassis. Both machines are based on a custom motherboard manufactured by Intel for Convey, which has a single PCI-Express 2.0 x16 slot, an integrated 3 Gb/sec SATA disk controller, and two on-board Gigabit Ethernet ports.

With the quad-core Xeon on one side and the souped-up FPGA on the other side, Toal says that a single HC-1ex node can do between two and three times the work of the prior HC-1 node. The older HC-1 machine is still available and costs $25,000 while the newer machine costs only $35,000. That works out to a 30 to 50 per cent improvement in bang for the buck on the computational jobs that the Convey machines are designed to run.

But more importantly, says Toal, is that on certain workloads where the FPGA is morphed into a computer that does a very specific kind of computation and wastes no resources, the new HC-1ex can do a lot more work than a two-socket, eight core Xeon box from Intel. For instance, one of Convey's customers needed to do 2-bit math, and programmed the FPGA to do this against 2-bit registers, On a conventional 32-bit or 64-bit processor, the registers are the same size and most of the resources on the chip are wasted. On bioinformatic applications that sift through DNA's genetic code, the HC-1 is showing about 25 times the oomph of a two-socket Xeon 5400 server and the HC-1ex is showing about 50 times the throughput.

"These are examples of where the HC-1ex just screams," says Toal. The price/performance advantages for these workloads are huge, even at the high price that customers pay for one Convey node. Based on Toal's numbers, it would take around $250,000 worth of eight-socket x64 servers to match the performance of a single $35,000 HC-1ex machine, assuming a configured x64 server cost around $5,000. This is a factor of seven improvement in bang for the buck.

Toal says that one of the big national labs in the United States is one of Convey's 30 customers, and that thus far the company has gotten traction in government agencies, bioinformatics companies, and oddly enough IT vendors who want to play around with FPGAs. At the moment, the typical customer buys from one to eight HC-1 or HC-1ex nodes.

To date, most of Convey's customers are in the proof of concept phase, which is one of the reasons why Convey has not been pushing hard to upgrade the underlying hardware platform to the more modern QPI-based Xeon systems. Obviously, these more current Xeon chips and chipsets have much more memory bandwidth and will be appealing to customers who want to roll Convey's hybrid supers out into production with many nodes working in concert.

Toal would not tip his cards to say whether the company would go with custom two-socket machines in future designs based on future "Sandy Bridge" Xeons or push the iron even harder up to Xeon 7500 boxes, which can scale from two to eight sockets and to much larger memory capacities. Toal did admit that there are interesting possibilities to scale up the iron.

Convey quietly shipped the first HC-1ex hybrid machine back in September to the Georgia Institute of Technology, but they are generally available now. Convey has also inked partnerships with five companies to help push the platform. Panasas is now a network-attached storage partner for the hybrid core machines. The company has tapped AutoESL, Impulse, and Jacquard Computing for their compiler-to-FPGA porting tools.</p[>

Voci, which has created speech recognition software that runs on the Convey boxes, is launching an appliance version of its application on the HC-1 and HC-1ex called V-Blaze that it will in turn sell to companies that want to add speech recognition and speech-to-text to their own applications. Voci says that the V-Blaze appliance can take the conversations from over a hundred phone conversations and convert them to text on the fly. There's probably all kinds of nefarious as well as useful purposes this could be put to. ®