SGI readies first Project Mojo supers

Sticking it to x64 racks and blades

5 things you didn’t know about cloud backup

Supercomputer maker Silicon Graphics is just about finished with the initial designs of the "Project Mojo" dense-packed HPC machines, sources tell El Reg.

If you have been pondering how SGI was going to be able to cram a petaflops of supercomputing oomph into a rack of servers, as the company promised back in June it would be able to do within a year's time or so, the answer is something SGI is calling a stick as well as a new class of racks for the consolidated SGI and Rackable Systems server product lines. And the answer also appears to be that it can't quite cram a petaflops into a rack after all, but has nonetheless come up with a more compact design than traditional rack and blade servers offer.

Months before the US military's Defense Advanced Research Projects Agency issued its Exascale Challenge informally in June 2009 and formally in March 2010, the engineers at SGI, the result of the then newly merged supercomputer maker Silicon Graphics and hyperscale server maker Rackable Systems, were kicking around the idea of how they might cram a petaflops into a server rack. Even though SGI was measuring those petaflops in single precision mode, which made the task a bit less difficult than it going double-precision, Project Mojo would still require a different approach from just plunking GPU or other kinds of co-processors onto existing SGI rack and blade servers.

"We started the Project Mojo design with the GPU and PCI-Express form factor that they come in and wrapped the CPUs around them," explains Bill Mannel, vice president of product marketing at SGI, "rather than starting with an existing server first and then adding GPUs."

SGI was vague back in June about how it would package the CPUs and their co-processor accelerators, but it did say it planned to use FireStream GPUs from Advanced Micro Devices and Tesla GPUs from Nvidia for floating point jobs and massively multicored mesh processors from Tilera to accelerate integer processing for the main CPUs in the Project Mojo system.

As it turns out, the stick of the Project Mojo system is a computing element that is nearly as long as the rack is deep - three feet - with the width and a little more than the height of a double-wide PCI-Express peripheral card. Mannel wouldn't say what processor is implemented on the stick, but it is possible that SGI has variants with both Intel Xeon and AMD Opteron processors. Considering that Project Mojo is an experimental system with limited sales on the front end, it is reasonable to conjecture that SGI will start with Xeons and expand into Opterons if there is customer demand.

Each stick has room for two double-width fanless GPU co-processors and two processor sockets. Each socket gets its own GPU in the floating point models; it is unclear how many Tilera chips will be in the integer models.

The Project Mojo systems will come in two racks and with two different stick capacities. The high-end box will use a modified version of the 24-inch blade racks employed by the Altix UV 1000 supers, which are based on Intel's Xeon 7500 processors and SGI's NUMAlink 5 shared memory interconnect, while another will be based on a new 19-inch rack, code-named "Destination," that aims to replace the 20 different racks that SGI inherited from the merger of SGI and Rackable Systems. The modified 24-inch Altix UV rack will hold 80 sticks, each with two CPUs and two double-wide GPU co-processors. The 19-inch Destination rack will be able to hold 63 sticks.

Assuming SGI can employ the AMD FireStream GPUs announced in late June, and based on the "Cypress" GPUs, in the Project Mojo boxes, then the larger 24-inch rack machine using the double-wide FireStream 9370 should hit 422 teraflops of aggregate GPU performance and the smaller 19-inch rack should come in at 332.6 teraflops. The CPUs won't add much to the processing capacity.

Using Nvidia's double-wide, fanless Tesla M2070 GPUs, then the Mojo stick will be rated at 2.06 teraflops in single precision, which adds up to 164.8 teraflops for the 24-inch rack and 129.8 teraflops for the 19-inch rack. The AMD FireSteam 9370 has a huge single-precision advantage over Nvidia, but the AMD 9370 card weighs in at only 528 gigaflops doing double-precision math, compared to 515 gigaflops for the Tesla M2070. As for double precision, biggest Project Mojo system will only deliver 82.4 teraflops with the Tesla M2070s and 84.5 teraflops with the FireStream 9370s.

It would make far more sense to put two processor sockets and four single-width fanless GPUs on the Mojo stick. Doing so using the AMD FireStream 9350 fanless GPU co-processors would yield 8 teraflops of oomph per stick, or 640 teraflops of aggregate GPU floating point performance at single precision. With four single-width Tesla M2050s, the 80-stick rack could deliver 329.6 teraflops of SP number crunching.

For now, SGI isn't saying what all the possible GPU co-processor configurations will be.

By the way, SGI never promised to have a petaflops of oomph out the door on day one, but merely said that there should be multiple ways of getting to a petaflops within one year's time. These initial Mojo sticks are just the first pass. That said, to hit even a petaflops at single precision, the Project Mojo sticks are going to have to more than double up GPU oomph.

An increase in GPU performance could prove to be problematic if AMD can't get the "Northern Islands" kickers to the current Cypress chips out the door by the end of this year, as planned. The rumors suggest that the Northern Island GPUs being fabbed by Taiwan Semiconductor Manufacturing Corp have indeed hit delays, and that there is some stopgap GPU, possibly to be fabbed by GlobalFoundries, called Southern Islands. Both AMD and Nvidia have been tight-lipped about their future GPU roadmaps, but they will probably start talking them up at the GPU Technology Conference in San Jose next week.

The Project Mojo sticks using Tilera co-processors could cram a lot of integer punch. Tilera says that the performance of the Tile-Gx series of chips maxxes out at 750 billion operations per second, which works out to five instructions per clock (using what everyone assumes is a modified MIPS RISC core) running at the top-end 1.5 GHz speed these future Tile-Gx100 chips will hit in 2011.

Assuming you could get eight Tilera 100-core chips on a Project Mojo stick and the same 80 sticks in a 24-inch rack, that works out to 480 trillion integer operations per second. You need a little more than twice this density to do integer math on the analog of a petaflops in floating point performance, which is a quadrillion (1015) integer calculations per second. Luckily, Tilera is working on a 200-core chip, due around 2013, which should help SGI hit that goal.

As El Reg previously reported, Tilera is working with Chinese PC maker and server wannabe Quanta to put eight half-width server nodes based on a single TilePro64 64-core processor running at 900 MHz into an SQ2 rack server with a 2U form factor. That 2U machine with eight server nodes is rated at 1.3 trillion integer operations per second. You would need 769 of these 2U servers, or over 18 racks, to hit that quadrillion integer operations performance level, using these TilePro64 chips and that tray server design.

Mannel says that SGI has the design for the Project Mojo machines more or less done now, and the prototypes of the supers will be on display at the SC10 supercomputing conference in November down in New Orleans. The sticks and racks will begin their initial shipments in December. ®

Build a business case: developing custom apps

More from The Register

next story
Microsoft: Azure isn't ready for biz-critical apps … yet
Microsoft will move its own IT to the cloud to avoid $200m server bill
Shoot-em-up: Sony Online Entertainment hit by 'large scale DDoS attack'
Games disrupted as firm struggles to control network
Silicon Valley jolted by magnitude 6.1 quake – its biggest in 25 years
Did the earth move for you at VMworld – oh, OK. It just did. A lot
VMware's high-wire balancing act: EVO might drag us ALL down
Get it right, EMC, or there'll be STORAGE CIVIL WAR. Mark my words
Forrester says it's time to give up on physical storage arrays
The physical/virtual storage tipping point may just have arrived
Better be Nimble, tech giants, or mutant upstarts will make off with your sales
Usual suspects struggling to create competing products
VMware vaporises vCHS hybrid cloud service
AnD yEt mOre cRazy cAps to dEal wIth
prev story


A new approach to endpoint data protection
What is the best way to ensure comprehensive visibility, management, and control of information on both company-owned and employee-owned devices?
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Maximize storage efficiency across the enterprise
The HP StoreOnce backup solution offers highly flexible, centrally managed, and highly efficient data protection for any enterprise.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.