Original URL: https://www.theregister.co.uk/2006/03/18/cray_cascade/
Cray to woo DARPA and others with server candy store
'Give us one of those, 200 of those . . .'
Anyone watching Cray slog it out in the supercomputing marketing probably saw this coming. The company has decided to converge its four, separate product lines and center them around some common technology. In so doing, the cash-strapped Cray expects to save money on hardware design costs and to offer customers gear at a more affordable price.
Key to Cray's shift toward "adaptive supercomputing" will be its embrace of blade-server like technology. Instead of having unique systems running on standard (scalar), vector, multi-threaded and FPGA processors, Cray will build a single chassis and then slot processing blades into this system as ordered by customers. At the heart of the common hardware system will be Cray's I/O and networking technology.
Cray expects these new blades to go on sale around 2008 but doesn't plan to stop advancing this "bet the company plan" there. As part of its "Cascade" project bid for a lucrative DARPA contract, Cray is working on software that will break up workloads and automatically spread jobs across the different types of blades, depending on which system can crank through the code fastest. This type of technology should arrive around 2010, although bringing it to fruition will likely depend on whether or not Cray gets a piece of the $200m or more DARPA handout.
Cray currently sells four different types of systems - the XT3 (Opteron), X1E (vector), MTA (multi-threaded) and XD1 (FPGA accelerators).
The problem with this diverse lineup is that it actually prohibits some customers from buying the types of systems they really want, according to Cray SVP Jan Silverman.
A customer might, for example, want to base its purchase around the speedy XT3 box but still need a vector machine to handle certain jobs. So far, this has required the purchase of an entire vector system at a pretty high cost. Customers often can't afford to buy both sets of kit, and end up passing on the the vector gear.
Cray's upcoming architecture could help solve this problem.
"Instead of spending about $1m on an entire vector machine, the customer can buy our cluster and fill it mostly full of Opteron blades and then - for $25,000 - throw a vector machine in there as well," Silverman said.
Heading to the singles bar
The first major push to turning these plans into reality will hinge on Cray creating a single user interface for all its current systems. It expects to roll this software out by 2006, while performing the usual upgrades to its systems. Such hardware advances will include the XT4 Opteron-based box that will support quad-socket boards. In addition, Cray will ship a follow on to the X1E with better tools for handling scalar code. It will also roll out a cheaper, third-generation MTA box and a low-end XD1 system.
During the blade phase, Cray will move to a common platform running Linux, a fresh version of its proprietary interconnect and more shared networking technology.
When (and if) Cascade arrives, Cray will take the whole plan to the next level by pumping systems full of different blades and then having an intelligent software package that can spread different workloads across the blades in order to ensure better performance. This strategy is what Cray is pitching to DARPA - which is also considering proposals from IBM and Sun. DARPA plans later this year to announce whether one or two of the companies will receive a massive grant meant to push the US's supercomputing capabilities a full generation ahead of what countries such as Japan and China have planned.
The core strengths of AMD's Opteron chip play well into Cray's plans. With one memory controller per chip, Opteron can scale well to four-socket systems and beyond. In addition, its open Hypertransport specification should help interest partners who can give Opteorn a bit of boost by providing custom technology that plays well in the supercomputing market. (More on this another day.)
So far, Cray is a bit hazy on exactly how it will create the software layer meant to divvy up workloads across the different blades. But it will basically require compiler software that can look out over applications and decided what parts of the applications are best suited to run on the general purpose Opteron blades or to make their way to the more specialized blades.
"With a new set of tools at their disposal, that work across processor technologies, users will more easily map their applications to the new system," Cray said. "Over time, these tools will get better and better at running optimally out-of-the-box. But users will maintain control and have better analysis tools at their disposal."
So, at first, it seems that customers will want to flag chunks of code that are vectorizable or instruct their apps to tap a library of routines sitting on an FPGA accelerator. Then Cray will deliver packages that do much of this work for the customer.
All told, Cray has a lot of work to do.
Cray has suffered through a couple of serious layoffs in recent years and struggled to make good on the Tera (MTA) and OctigaBay (XD1) acquisitions. In addition, Silverman noted that Cray may well not be able to pull off the "Cascade" vision should it fail to win the DARPA contract.
That said, Cray continues to have government customers wrapped around its finger. Only a few companies do the specialized supercomptuing work desired by the large labs and agencies, and Cray is really the largest company of that select bunch. Even if the DARPA bid doesn't come through, Cray can probably keep milking these guys for a bit more cash and get close to the Cascade result. ®