Cray lands $70m super deals
Beats IBM in next-gen petaflop push
Cray has won two big supercomputer deals with a combined value of $70m just as it reported  decent sales and a profit in its second quarter and raised its guidance for 2009.
The first deal is with Oak Ridge National Laboratory, upgrading its Jaguar XT5 massively parallel Opteron-Linux super to the latest six-core Istanbul Opteron processors from AMD. Oak Ridge is one of the many supercomputing research labs run by the US Department of Energy (DoE).
The Jaguar super is Cray's current flagship installation. It went into Oak Ridge in the summer of 2008 and debuted on the Top 500 supers list using Cray's XT4 interconnect and AMD's quad-core Shanghai Opteron processors. The initial Jaguar installation used the quad-core Budapest Opteron processors running at 2.1GHz, which plugged into single-socket server boards.
This machine had a total of 31,328 compute cores and delivered a peak performance on the Linpack Fortran benchmark test of 260.2 teraflops and a sustained performance of 205 teraflops. This used the SeaStar2 interconnect that was an upgraded version of the SeaStar interconnect deployed in the Red Storm super at Sandia National Laboratory. This initial machine had 62TB of main memory across its nodes and 600TB of disk.
By the end of 2008, when a new XT5 machine was plunked down next to the XT4, Jaguar got quad-core Barcelona Opterons running at 2.3GHz, with 149,504 compute cores, giving the machine a sustained performance of 1.06 petaflops. This portion of the Jaguar machine uses the SeaStar2+ interconnect - better, stronger, faster - and has 300TB of memory and 6PB of disk. It is important to note that with the XT5, Cray shifted to two-socket motherboards, after getting totally screwed over  by AMD thanks to the lateness of the Budapest quad-core Opterons used in the XT4.
When you add it all up, the Jaguar boxes - which can be clustered to share workloads - are basically neck-and-neck with IBM's Roadrunner hybrid Opteron-Cell Linux cluster over at Los Alamos National Laboratory, yet another DoE lab. Both machines have broken through the petaflops barrier and both vendors are now pushing up to 10 petaflops and beyond.
Under the deal Cray has inked with Oak Ridge, the XT5 machine will be upgraded to the new six-core Istanbul chips, boosting the core count to over 224,256 on the compute nodes. The upgrade is expected to be completed on the XT5 partition on Jaguar by the end of the year and boost its peak performance to more than two petaflops. Neither Cray nor Oak Ridge said which Istanbul chips would be put into the XT5 partition of Jaguar, but it will probably be one of the standard Opteron parts  announced in June, not one of the low-power or high-clock speed variants announced  in mid-July.
The price difference between the 2.6GHz Opteron 2435 and the 2.4GHz Opteron 2431 is pretty high - $989 a pop versus $698 when buying in 1,000-unit quantities, or a 41.7 per cent price premium for an 8.3 per cent bump in performance - so you would guess Oak Ridge would go with the 2.4 GHz Istanbul chip. If it does, that would give the XT5 part of Jaguar a peak performance of 2.07 petaflops; boosting to the 2.6GHz Istanbul chip pushes the performance up to the 2.24 petaflops level. This upgrade is apparently worth just under $20m.
That was apparently the easier of the two announced deals for Cray to chase and win. The other, with Lawrence Berkeley National Laboratory's DoE-funded HPC center, called the National Energy Research Scientific Computing (NERSC) center, had pitted Cray against IBM for its next generation petaflops-scale supercomputer.
Cray won the deal with a multiyear contract that will see NERSC put in an XT5 and then upgrade it to a future Cray parallel super, presumably also based on Opteron processors. This deal is worth more than $50m, according to Cray, and the full machine will go into production by the end of 2010. NERSC currently has an XT4 box rated at 355 peak teraflops, nicknamed Franklin, which has 38,642 Budapest Opteron cores linked with the SeaStar2 interconnect.
In a conference call with Wall Street analysts to talk about its financials along with these two deals and future product plans, president and chief executive officer Peter Ungaro was asked about how he felt about competitive wins against his old employer, IBM. "We are feeling pretty good about that right now," he said with a laugh.
Ungaro also talked vaguely about Cray's plans to upgrade its XT line of parallel supers and discussed its foray into the entry or personal supercomputer space with the CX1 Xeon-based baby blade super announced  last September and its follow-on, the even less expensive CX1-LC that debuted  last month.
First up, as you can see from the deals, Cray is ramping up its use of the Istanbul six-core Opterons. Cray had originally anticipated it would be able to get Istanbuls into the XT5 lineup at the end of the year, following a much later announcement by AMD, but the chip
maker designer and seller was able to move up the Istanbul launch by a few months, and Cray was able to match the pace. Ungaro says that some early customers already have Istanbuls running in their XT machines and that support for the chips will be generally available in a few weeks.
The longer-term upgrade plan is a little vague, but the XT parallel supers will be upgraded sometime in the first half of 2010 and Ungaro characterized this upgrade as not being a big deal in terms of technology change. In the early part of the second half of 2010 - why Ungaro can't just say "in the third quarter with room for slippage" is beyond me - a major upgrade to the XT line is due, one with substantial changes in hardware and software and presumably based on the XT Opteron-Linux platform.
Ungaro also made it clear that Cray was not dropping its support for Opterons even though it did do a partnership deal with Intel back in April 2008  in the wake of the Budapest Opteron delays.
"Intel is part of the future roadmap in addition to AMD," Ungaro said.
He added that the future Intel-based supercomputers would come out around 2012 and would be part of a $250m contract that Cray won  in November 2006 from the Defense Advanced Research Projects Agency (DARPA) alongside IBM, which got $244m. Both vendors have been tasked with putting petaflops scale supers into the field by 2010 under the contract.
IBM will be fielding a parallel machine based on the eight-core Power7 chip  running AIX and Linux, dubbed Blue Waters and set to be installed  at the University of Illinois. Cray was planning on a merged product called Cascade that put its Opteron, XMT Tera multithreaded engines, vector, and field programmable gate array engines into a single architecture.
Now, the Cascade machine will apparently use future Xeon processors and heaven only knows what. While this Cascade box is going to be two years later than planned, Cray is clearly going to be able to get to the two petaflops performance level that the DARPA contract calls for without resorting to a new architecture.
Cray also has its eyes set a lot lower than this in the baby super business. On the CX1 front, Ungaro says that the company now has 25 resellers signed up to peddle the boxes - more than he ever expected to be interested - and that it will take some time to get them trained on the boxes and ramped up to sell them. Perhaps one quarter to train and one or two to start building a pipeline.
Ungaro said that there has been a lot of interest around the CX1 machines, but that it would take time to convert interest to revenue and that Cray did not expect the baby supers to deliver any material revenues in 2009.
Cray does not seem interested in putting out an Opteron-based version of the CX1, but given that sometimes Opterons do better than Xeons, and that it has learned to have two sources for chips in high-end servers the hard way, you would think that a CO1 baby super would be in the works. We'll see. ®