China picks MIPS for super-duper super
Dawning of a new petafloppy day - perhaps
The Chinese government burst onto the supercomputing scene in a big way last November when the Tianhe-1 massively parallel cluster at the National Supercomputer Center in Tianjin came in at number five on the global HPC ranking with a hybrid Intel Xeon-AMD Radeon GPU box. But it looks like the future of petaflops computing in the Middle Kingdom may be a variant of the MIPS processor that's used in embedded applications such as routers and, in China at least, in Debian-based netbooks.
In terms of raw theoretical number-crunching throughput, Tianhe-1 has already pushed above 1.2 petaflops. But on the Linpack Fortran benchmark that's used to rank the Top 500 supercomputers in the world, Tianhe-1 could only push 563.1 teraflops. A lot of that oomph came from the graphics co-processors in the system, and the wide gulf between peak and sustained performance shows that there is still work to be done in CPU-GPU clusters.
The Tianhe-1 machine is comprised of 8,960 server nodes linked by a 20Gb/sec InfiniBand backbone, with each node having two Xeon 5500-class processors running at 2.53GHz (for a total of 71,680 cores) and two ATI Radeon HD 4870 graphics cards.
Looking ahead to real petaflops performance, Technology Review, the tech trade rag affiliated with the Massachusetts Institute of Technology, reports that the Institute of Computing Technology (ICT), part of the Chinese Academy of Sciences and an organization that has been funding the development of various MIPS processors since 2002, has tapped its own future Loongson-3 MIPS variants to be at the heart of the petascale Dawning 6000 super.
Weiwu Hu, chief architect of the Loongson processors developed by ICT, told Technology Review that the future Dawning 6000 super, presumably based on the quad-core Loogson-3 MIPS-style processor, would be finished by the middle of this year and operational by the end of 2010. ICT originally got access to MIPS technology by virtue of its partnership with wafer-baker STMicroelectronics, but last June it licensed the MIPS32 and MIPS64 architectures straight from MIPS Technologies, the chip-designing division of Silicon Graphics that was spun out in an initial public offering in 1998.
Be careful what technologies you let go of. Wouldn't it be ironic if parallel MIPS boxes started making it tough for X64 enthusiasts like SGI and Cray to sell parallel monsters?
The initial Loongson-1 processors were 32-bit chips at an unimpressive 266MHz, and the Loongson-2 went to 64-bit processing and was goosed as far as 1.2GHz. With the Loongson-2F chip in 2007, ICT came out with a design that has four cores (expandable to 16) with two floating-point units per core (one with a SIMD unit), plus 512KB of L2 cache and a DDR2 memory controller embedded on the chip.
It was these Loongson-2F chips that the University of Science and Technology in China used to make a 1-teraflops parallel super at the end of 2007 that cost something like $120,000. The Loongson-3 chip was supposed to come out last year with four cores and 4MB of L2 cache on the chip, but it slipped into this year. (Welcome to the joys of the chip biz, China.)
There is some speculation that ICT will actually plunk eight cores onto the Loongson-3 chips using 65-nanometer processes when it delivers the chip this year instead of the quad-cores expected last year. ICT did not confirm what variant of Loongson-3 would be used in the future Dawning 6000 cluster.
Interestingly, according to a paper published at the IEEE written by the chip designers at ICT and entitled "Godson-3: A Scalable Multicore RISC Processor with x86 Emulation," the impending Chinese variant of the MIPS chip will be able to emulate x86 instructions. (Loongson and Godson seem to be synonymous; those are not two chip names.) The chip apparently has instructions added to help the QEMU hypervisor (the one that's at the heart of Red Hat's KVM) to translate instructions from x86 to MIPS format. According to early benchmarks, the emulation has about a 30-per cent penalty.
Such emulation, if it works well, could be not only interesting for PCs of various shapes and sizes, but also for supercomputing workloads.
The Chinese government is not, of course, the first organization to try to take the MIPS architecture back into the supercomputing world from whence it came. SiCortex, for example, gave it a go with its innovative machines, but ended up peddling its assets last summer when the business didn't take off.
But the SiCortex machine didn't have an x86 emulation mode, and ICT might be on to something if x86 and MIPS code can be run on the same machine, perhaps supporting a mix of Windows and Linux workloads on a power-efficient box. Then again, the Dawning 6000 could be a kludge, something done for political reasons more than technical ones. ®
Sponsored: DevOps and continuous delivery