Convey upgrades hybrid core supercomputer
Drop-in FPGA upgrade
SC10 Not all of the talk at the SC10 supercomputing extravaganza in New Orleans is about GPU co-processors. Convey Computer – which burst into the HPC scene two years ago with a hybrid supercomputer that employs x64 processors and field programmable gate array (FPGA) accelerators – is upgrading its HC-1 supercomputer with a new generation of FPGAs.
Convey Computer's Hybrid Core-1 super is not all the rage like hybrid CPU-GPU machines, but for certain applications, a more expensive FPGA makes more sense as an accelerator because it can be programmed to have different personalities and to accelerate very specific algorithms. The original HC-1 runs a Linux software stack (of course), and put a Xilinx Virtex-5 FPGA inside of an old Xeon socket using the frontside bus architecture from the Xeon 5300 generation.
Convey has licensed both the frontside bus and the newer QuickPath Interconnect used in more modern Xeon 5500, 5600, 6500, and 7500 processors from the chip giant. But it has not yet implemented a QPI version of the machine. Bruce Toal, the company's co-founder and chief executive officer, said "it is getting close."
With the HC-1ex supercomputer announced at SC10, Convey is upgrading to the latest Virtex-6 FPGA, which offers nearly four times the elements of the prior generation of FPGAs and is yielding somewhere between two and three times more performance running certain algorithms compared to the HC-1 that started shipping in volume in the middle of last year.
The HC-1 and now the HC-1ex basically take a two-socket Xeon server and turn one of those sockets into a math co-processor for the other socket. (You can get the full details of the HC-1 architecture here in our coverage from a year ago.) The innovation that Convey has done with its design is not pairing an FPGA to CPU, but doing so using a custom motherboard that allows the two devices to have a cache-coherent shared virtual memory space, and with the integrated programming environment that Convey has created, applications see the x64 instruction set and a set of co-processor instructions implemented in the FPGA's personality. Programmers using standard C, C++, and Fortran compilers see the extra instructions implemented in the FPGA and can make use of them in their code.
The FPGA chip has 16 memory channels reaching out to the system, providing 80 GB/sec of bandwidth into the FPGA. The HC-1 system boards have four DIMM channels for the single x64 processor and 16 DIMM channels for the FPGA, which are linked to each other through the front side bus architecture - just like Xeon processors are hooked to each other in a two-socket machine. The HC-1 uses standard DIMMs that have been optimized for cache line transfers (sequential access) and also has a special set of scatter-gather DIMMs (SG-DIMMs) that are optimized for 8-byte transfers (random access). The Xeon side of the original system can support up to 32 GB of main memory for applications using 8 GB DIMMs, while the co-processor side of the system board can support up to 128 GB of memory.
Next page: Xeon times four