Godson: China shuns US silicon with faux x86 superchip
Who needs GPU co-processors?
With the Godson-3B, which is what Hu was there to talk about in San Francisco, ICT is sticking with the same 65 nanometer CMOS process and running the chip at the same 1 GHz. But the chip is bumped up to eight cores from four and has two 256-bit vector co-processors per core. The chip has two HyperTransport ports and two DDR3 memory controllers, and weighs in at 583 million transistors in a 300 square millimeter area. Running at 1 GHz, peak performance on those vector units is 128 gigaflops, with the chip only emitting 40 watts. According to early tests, the cores burn about 28.9 watts, while the uncore parts of the chip (HT, memory controllers, and crossbar switches for linking chips together) consume 11.1 watts.
According to Hu, the vector extension unit in the Godson-3B and Godson-2H processors have 128-entry, 256-bit register files and have more than 300 SIMD instructions that have been added to the MIPS architecture.
Here's what the Godson-3B chip looks like:
The Godson-3B processor will be used in the Dawning 6000 petaflops supercomputer, which China will be tweaking in 2012. Here's an early version of the blade equipped for the Godson-3B chips:
Dawning's two-socket Godson-3A and Godson-3B blade server
And this is what the blade server chassis looks like for the Dawning 6000:
The Dawning 6000 supercomputer blade server chassis
The Dawning 6000 blade design is used by the National Supercomputing Center in Shenzhen for its hybrid Xeon 5650-Nvidia M2050 system, which ranked number three on the Top 500 list from November 2010. That machine had an aggregate 1.27 petaflops of sustained performance running the Linpack Fortran benchmark test.
Another Dawning 6000 blade cluster with 3,000 of the Godson-3B chips, and rated at around 300 sustained teraflops, is expected to be up and running this summer, Hu said. (That would be about 384 peak theoretical teraflops just counting the vector units, not the cores.)
Those Dawning 6000 blades are by no means the highest density that ICT can come up with. Check out this system board for a 1U rack server that Hu showed off at ISSCC this week:
This IU2T system board packs 16 of the eight-core Godson-3B processors onto a single board, rated at 2 teraflops. So a rack of these puppies would yield 42 teraflops. So instead of hundreds of cabinets to reach 1 petaflops of raw number-crunching performance, as it can take with big x64-based machines, ICT could, in theory, do it with 24 racks.
ICT is not going to stop here. The Godson-3C design will shift to a 28 nanometer process and will come in eight-core variants like the Godson-3B as well as a 16-core variant. The Godson-3C will have faster clock speeds, too, running at between 1.5 GHz and 2 GHz. The roadmap says the chip is also capable of expanding up to 16 cores, too. ICT says the Godson-3C will deliver 512 gigaflops of raw performance on math work, and the way the math works, that is twice as much math moving from 1 GHz to 2 GHz and then a doubling again as the core count goes from 8 to 16. This chip is expected sometime around late 2012 or early 2013.
Wouldn't it be funny if Silicon Graphics started building systems with these Godson-3 chips? They could dust off Irix and take it out for a spin on some new iron and allow it to run x64-based Linux applications in emulation mode. ®