Cray bags $30m to upgrade Edinburgh super to petaflops-class

Hector XE6 to be upgraded to Archer XC30

Top 5 reasons to deploy VMware with Tegile

It is not much of a surprise, seeing as how the UK's national supercomputing service at the University of Edinburgh is a long-time Cray customer, that they would return to Cray to replace its existing XE6 system and replace it with a petaflops-class machine based on the latest Cray interconnect and Intel processor technology.

The new machine, to be called "Archer," will be based on Cray's latest XC30 iron, which pairs Intel's Xeon E5 processors with the "Aries" interconnect developed by the supercomputer maker through a contract with the US Defense Advanced Research Projects Agency.

The Engineering and Physical Sciences Research Council is providing the $30m in funding for the Archer system, which includes a production box with nearly three times the performance of the existing XE6 system, which has over 800 teraflops of aggregate number-crunching oomph and is nicknamed "Hector" – actually, they spell that "HECToR", for High-End Computing Terascale Resource, but we're not going to be drawn into such orthographic silliness.

The precise configuration details for the system were not provided by Cray or the University of Edinburgh, most likely because the machine will be based on the as-yet-unannounced "Ivy Bridge-EP" Xeon E5 processors from Intel. Earlier this year, Chipzilla said to expect the Xeon E5 chips to be launched in the third quarter, and early shipments for key cloud and supercomputer customers for the Ivy Bridge variants of these chips began several months ago.

The Hector system is comprised of 30 cabinets, which have a total of 704 of Cray's two-node XE6 blade servers. Each node on the blade has two sixteen-core "Interlagos" Opteron 6276 processors running at 2.3GHz. Each socket on the XE6 blade has 16GB of main memory, and with 2,816 compute nodes, that works out to 90,112 cores and 88TB of main memory.

The XE6 blade has two "Gemini" router interconnect chips, which implement a 3D torus across the nodes and allow it to scale up to multiple petaflops. The Hector machine also has a Lustre clustered file system that scales to over 1PB of capacity.

Schematic of the Archer production and development HPC systems

Schematic of the Archer development and production HPC systems

The XC30 supercomputer, developed under the codename "Cascade" by Cray with funding from DARPA's High Productivity Computing Systems program, started a decade ago. That funding came in two phases, with the initial lump sum of $43.1m being used to outline how Cray would converge various machines based on x86, vector, FPGA, and MTA multithreaded processors into a single platform. (GPU coprocessors had not become a thing yet in 2003 when the initial DARPA award came out.)

Three years later, DARPA gave Cray a $250m research award to further develop the Cascade system and its Aries Dragonfly interconnect as well as work on the Chapel parallel programming language. No one outside of Cray or DARPA knew it at the time, but the second part of the HPCS contract from DARPA originally called for Cray to take its multistreaming vector processors (all but forgotten at this point) and its massively multithreaded ThreadStorm processors (at the heart of the Urika graph analysis appliance) and combine them into a superchip.

But in January 2010, DARPA cut $60m from the Cascade contract and Cray focused on the very fast and expandable Dragonfly interconnect. The whole project cost $233.1m to develop, and now Cray has the right to sell iron based on that technology.

Cray got another $140m in April 2012 when it sold the intellectual property to the Gemini and Aries interconnects to Intel. Cray retains the rights to sell the Aries interconnect and is working with Intel on future interconnects, possibly codenamed "Pisces" and presumably used in the "Shasta" massively parallel systems that Intel and Cray said they are working on together as they announced the Aries interconnect sale.

The important thing as far as the University of Edinburgh is concerned is that the XC30 system has lots and lots of headroom – in fact, a fully loaded XC30 is designed to scale to well over 100 petaflops. But the UK HPC center is not going to be pushing the limits of the XC30 any time soon, with the Archer system having a little more than 2 petaflops of oomph. (Nearly three times the performance of the Hector machine, as the Cray statement explained.)

The plan for Archer, according to the bidding documents, calls for the university to get a test and development system as well as a much larger production system, with both boxes having compute nodes with standard memory but with a subset having fatter memory configurations.

A 56Gb/sec InfiniBand network links out to the Sonexion file systems (which has a total of 4.8PB of capacity and 100GB/sec of bandwidth into the system) and the login servers that sit in front of the production Archer machine. A 10Gb/sec Ethernet network hooks into other local storage and tape archiving, as well as to other network services.

The Archer deal includes the cost of the XC30 systems and the Sonexion storage as well as a multi-year services contract, all worth a combined $30m. (Yes, this kind of bundling makes it very tough to figure out what the system and storage hardware costs individually from services, and this is absolutely intentional.) Archer is expected to be put into production this year.

As we mentioned earlier, the Edinburgh HPC center is a long-time Cray customer, having installed a T3D parallel system in 1994 and added a T3E system in 1996 that peaked out at 309 gigaflops (you read that right) when it was retired in 2002. ®

Top 5 reasons to deploy VMware with Tegile

More from The Register

next story
NSA SOURCE CODE LEAK: Information slurp tools to appear online
Now you can run your own intelligence agency
Fat fingered geo-block kept Aussies in the dark
NASA launches new climate model at SC14
75 days of supercomputing later ...
Yahoo! blames! MONSTER! email! OUTAGE! on! CUT! CABLE! bungle!
Weekend woe for BT as telco struggles to restore service
Cloud unicorns are extinct so DiData cloud mess was YOUR fault
Applications need to be built to handle TITSUP incidents
BOFH: WHERE did this 'fax-enabled' printer UPGRADE come from?
Don't worry about that cable, it's part of the config
Stop the IoT revolution! We need to figure out packet sizes first
Researchers test 802.15.4 and find we know nuh-think! about large scale sensor network ops
SanDisk vows: We'll have a 16TB SSD WHOPPER by 2016
Flash WORM has a serious use for archived photos and videos
Astro-boffins start opening universe simulation data
Got a supercomputer? Want to simulate a universe? Here you go
prev story


Designing and building an open ITOA architecture
Learn about a new IT data taxonomy defined by the four data sources of IT visibility: wire, machine, agent, and synthetic data sets.
5 critical considerations for enterprise cloud backup
Key considerations when evaluating cloud backup solutions to ensure adequate protection security and availability of enterprise data.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Protecting against web application threats using SSL
SSL encryption can protect server‐to‐server communications, client devices, cloud resources, and other endpoints in order to help prevent the risk of data loss and losing customer trust.