Is HP's Gen8 good news for HPC?
Raw speed not always top consideration in enterprise servers
HPC blog HP trotted out its newest line of x86-based Proliant systems this week. These new boxes, fueled by Intel Sandy Bridge processors, will sport speedy PCIe 3.0 slots, custom HP disk controllers (for tri-mirroring and error correction), and provide a wide range of features aimed at improving system flexibility and manageability. Our pal Timothy Prickett Morgan outlines the systems here.
As Tim points out in his article, much of the innovation around the new Gen8 line is in the monitoring, management and manageability realm. This is a pretty good move by HP, since much of the pain (and cost) in enterprise computing arises from trying to tame growing numbers of systems handling increasingly complex business functions with the same, or fewer, heads.
HP is using the catch phrase “The world’s most self-sufficient line of servers” to describe a set of features and functions designed to improve performance, reduce planned and unplanned downtime, and almost double delivered compute-per-watt – while at the same time reducing the administration overhead. Much of this goodness comes from new Gen8 sensors and software that monitor more than 1,600 system parameters and take appropriate actions when necessary.
While these features will probably be welcomed by most enterprise data centers, conventional wisdom says that HPC customers typically eschew any sort of frippery that doesn’t give them more raw speed or power. They’re like the guys who will strip out every comfort feature in a car (radio, seat cushions, floor mats, fenders) in order to reduce weight and thus increase speed.
I had a conversation with Ed Turkel, HP’s HPC head honcho (or HPHPCHH for short), in which we discussed the Gen8 systems, their features, and the design philosophy behind them. We talked quite a bit about HP’s new IT management features, and whether they’re really relevant to HPC customers.
There’s no free lunch; all of this monitoring and automated management has a cost in terms of lost cycles. And that overhead, even if only a few percent, is probably unacceptable to the typical HPC customer. Moreover, they’ll definitely balk at paying more for the hardware to support these features, mainly because that money could be better used to buy more/faster CPUs or memory – or maybe a cool paint job of flames or a tiger or something.
It turns out that the vast majority of the monitoring and management functions are handled by HP’s iLO (Integrated Lights-Out) processor, an assist processor that’s buried deep inside every HP Proliant system. So there aren’t any agents sitting around taking up cycles and memory space. There’s no need for any additional network load, since iLO can be set up on its own network.
The iLO processor is present on every HP Proliant system these days, so there’s not any additional charge for it. In Gen8, HP has doubled the number of sensors, so it now gets input from individual NIC cards and GPUs, in addition to a whole slate of other hardware/software sources. A major benefit of all these sensors is that they perform predictive failure analysis and alert operators to problem components before they crap out completely.
Ed also highlighted HP’s Smart Update as one of his favorite features – it helps relieve headaches caused by trying to keep clusters current with up-to-date patches, updates, new BIOS releases, etc. With Smart Update, HP is gathering together all of its own hardware/software updates and major third-party updates (Linux drivers, for example), testing them to ensure they play well, and then releasing them in a group every quarter.