Bull waves red flag at HPC with blade supers
Never mind the bullx
Having seen its partner Sun Microsystems get the bulk of the 200 teraflops Juropa supercomputer  blade cluster deal at Forschungszentrum Jülich, French server maker Bull is trying to position itself as the European favorite for future deals at places such as FZJ with a new line of Xeon-based blade supers called bullx.
Yes, they named their machines after testicles, unless you want to be generous and say that x is a variable. And even then, you can have all sorts of fun with that. (Amuse yourself while I get on with the feeds and speeds.)
The bullx line is, according to Bull, the first European-designed, extreme scale supercomputer that can scale from teraflops to petaflops of number-crunching power. Bull says that it has over 500 supercomputing experts, who had input into the design, which was done in conjunction with some of its biggest customers (oil giant Total Fina and the French Commissariat à l'Énergie Atomique being the two biggies).
The bullx supers are packed into a fairly dense blade form factor, and include Xeon processor modules as well as hybrid accelerator blades that mix Xeons and graphics processor unit (GPU) math co-processors from nVidia to boost certain kinds of calculations.
The bullx chassis is a 7U form factor rack-mounted case that holds 18 half-height blade servers; ten across the bottom and eight across the top, leaving room in the middle of the upper row of blades for electronics and other gadgetry. This gear includes a chassis management module and a 24-port Gigabit Ethernet switch for managing the blades as well as a 36-port quad-data rate (40 gigabit/sec) InfiniBand switch module.
The chassis has room for four power supplies (three plus a spare) and two fan units, and also has a device Bull calls an ultra capacitor module (not a flux capacitor, so don't get excited), which stores up enough juice to let a chassis full of gear ride out a power outage as long as 250 milliseconds. (This may not sound like a lot until you have a simulation running for two months and the server nodes go blinky and you have to start all over again.) But more importantly, the ultra capacitor module means, according to Bull, that in areas that have good, steady electrical power, HPC centers can do away with uninterruptible power supplies, which cost money and consume about 15 per cent of the aggregate power in an HPC cluster because of the inefficiencies of charging batteries.
The bullx B500 compute blades look a lot like other current two-socket Xeon-based blade servers announced these days, but they are tweaked to support InfiniBand. The B500 blades are based on Intel's "Tylersburg" 5500 chipset and support the current "Nehalem EP" quad-core Xeon 5500 processors up to the X5570, which runs at 2.93 GHz but which kicks out 95 watts. Given the price premium of the X5570s and the heat (at 95 watts) they generate, it is far more likely that HPC customers will opt for the E5540, which runs at 2.53 GHz, dissipates 80 watts peak, and costs about half as much per chip.
The amount of memory that the B500 compute blade supports depends on the memory speed you want. If you are cool with 1.07 GHz DDR3 main memory, you can plunk in 96 GB in the 12 slots using 8 GB DIMMs, but if you want faster 1.33 GHz memory, then you can only use six of the slots for a maximum of 48 GB. (It seems far more likely, given the wicked expense of the 8 GB DIMMs, that HPC shops will use cheaper 4 GB DIMMs.) Each blade sports a ConnectX converged server and storage InfiniBand adapter from Mellanox (which plugs into the PCI-Express 2.0 slot on the blade) and a two-port Gigabit Ethernet NIC. The blade has room for one SATA or SSD drive mounted on the blade.
The B505 accelerator blade in the bullx HPC box is a double-wide blade that pairs a single two-socket Nehalem EP server with two Tesla M1060 co-processors. This blade is based on the 5520 variant of the Tylersburg chipset and has only six DDR3 memory slots for a total of 48 GB of main memory (using 8 GB DIMMs) running at 1.33 GHz.
There are apparently two chipsets on this accelerator blade and two ConnectX adapters, but only two processor sockets, which support up to the 2.93 GHz X5570 Nehalem EP chips. (The extra chipsets provide extra I/O, which keeps the GPUs well fed with data.) The accelerator blade has room for one SATA or SSD disk on the blade, and has two Gigabit Ethernet ports. The GPUs and the Mellanox cards plug into PCI-Express 2.0 adapter slots.
The bullx chassis and the Xeon 5500 compute blades are available now, but the accelerator blades will not ship until November. Pricing was not announced for the bullx machines. A CPU-only configuration delivers 1.69 teraflops of number-crunching power per chassis and a 42U rack with 108 blades will peak out at around 10 teraflops. So, ten racks of these puppies and you are at 1 petaflops, and that is without resorting to GPUs.
The Tesla M1060 GPUs are the same ones that motherboard and whitebox server maker Super Micro just crammed  into its own HPC server nodes, which are rack-style boxes, not blades. The M1060 GPU cards were announced at the beginning of June, and they have 240 cores clocking at 1.3 GHz, plus 4 GB of its own GDDR3 memory; each GPU is rated at 933 gigaflops on single-precision floating point calculations, but only 78 gigaflops on double-precision math.
The lack of performance on double-precision math limits the appeal of the CPU-GPU hybrid, but nVidia is supposedly working on a new packaging for the GPUs due early next year (and I am guessing to plug into normal processor sockets) that will also sport something close to parity between single-precision and double-precision math. When and if that happens, expect CPU-GPU hybrids to take off like mad.
Bull is supporting Red Hat Enterprise Linux 5 plus its own bullx Cluster Suite on the bullx HPC clusters, and is also supporting Microsoft's Windows HPC Server 2008. Given the popularity of Novell's SUSE Linux Enterprise Server in Europe, and especially among HPC shops, it seems odd that SLES 10 SP2 or SLES 11 are not yet supported.
According to a report  in HPCwire, the CEA and the University of Cologne in Germany are the first two customers for the bullx boxes. The University of Cardiff, which currently buys Bull boxes, was trotted out as part of the bullx announcements to say that it will keep the bullx boxes in its thoughts as it plans to upgrade its current "Merlin" Xeon-based cluster, which is rated at 20 teraflops and which was installed last June. ®