China fires up homegrown petaflops super

The Sunway Bluelight special

Broken CD with wrench

The Chinese government has booted up the first of three homegrown, petaflops-class massively parallel supercomputers based on indigenous technology.

The Sunway Bluelight – or Divinity Blue-Ray depending on how you want to translate its name from Chinese – is based on a 16-core processor, rumored to be a derivative of the DEC Alpha 21164. Which is both odd and interesting at the same time.

The Sunway Bluelight machine is based on this SW1600 chip, according to various reports – The New York Times got the scoop in the English-speaking papers. The Bluelight super has 8,704 SW1600 processors, which are an offshoot of RISC processors designed by the Chinese military for its own use, and presumably licensed from Digital shortly before or after it was eaten by Compaq in 1998.

The Bluelight Special is installed at the National Supercomputer Center in Jinan, which is in China's Shandong province.

The details on the SW1600 chip are sketchy, but in this presentation captured by IT168.com, the processors are 64-bit and run at between 975MHz and 1.1GHz, delivering somewhere between 124.8 and 140.8 gigaflops per chip.

According to a blog post by Oracle/Sun watcher Hung-Sheng Tsao, the Bluelight super has a peak theoretical performance of 1.07 petaflops and a sustained performance of 795.9 teraflops on the Linpack Fortran benchmark test that is commonly used to rate the raw performance of massively parallel machines doing number-crunching work.

The machine also burns 1.074 megawatts of juice, which is impressively efficient, and has water blocks on key electronic components for cooling. Other machines in the petaflops class consume multiple megawatts these days.

The Bluelight design puts two SW1600 processors onto a single card with 16GB of main memory soldered right onto board so they can be packed densely. Four of these cards plug into a single 1U chassis, and a rack of these machines has 1,024 sockets or 16,384 cores.

To break through 1 petaflops, you need only nine racks of the Bluelight nodes, and these are arranged in an oval pattern – just to be different. The nodes are linked using Quad Data Rate (40Gb/sec) InfiniBand switches, and in this case, China is using a mix of 256-port and 324-port switches of unknown origin. The InfiniBand is used to link the nodes in a fat tree configuration with 2 microsecond latency in node hops.

If you want to see a video of the new machines, check out China Central Television here.

The SW1600 is not the only indigenous chip that the Chinese government is investing in to be put into supercomputers and other devices. The Loongson ( in English, Godson) processors, which are a licensed iteration on the 64-bit MIPS chips formerly controlled by Silicon Graphics, are part of another petaflops-class supercomputer that came to light in January 2010.

The Institute of Computing Technology at the Chinese Academy of Sciences revealed the feeds and speeds of the Godson chips at the International Solid-State Circuits Conference hosted by the IEEE back in February this year. The current Godson-3B is an eight-core chip, and the plan is to ramp it to sixteen cores.

This chip, which will have an x86 emulation mode, will be used in the Dawning 6000 supercomputer, which aims to break 1 petaflops as well when it is installed at the National Supercomputing Center in Shenzhen. This Dawning 6000 system is based on a blade server design rather than rack servers, with each blade holding an impressive sixteen chips for a total of 256 cores per blade – that's two eight-socket boards mounted end-to-end.

The Bluelight super was installed in September and revealed late last week. It should feature prominently in the next Top 500 super computer ranking that comes out at the SC11 conference in Seattle in two weeks. But it will not be at the top of the list. ®

Sponsored: How to determine if cloud backup is right for your servers