T-Platforms CPU-GPU hybrid hits 1.3 petaflops at Moscow State

Russian super maker invades Amerika

Designing a Defense for Mobile Applications

Moscow State University has moved into the upper echelons of the HPC field with an upgrade to its top-end supercomputer and moved to hybrid CPU-GPU blade servers from indigenous supercomputer maker T-Platforms.

It comes as no surprise that MSU has bulked up the math skills of the supercomputer, which is named after 18th century Russian polymath Mikhail Lomonosov, with Tesla GPU coprocessors from Nvidia. The Tesla GPU coprocessors, which are powered by the 512-core "Fermi" GPUs that are also used in Nvidia's graphics cards, are vastly preferred over FireStream alternatives from Advanced Micro Devices thanks to the CUDA programming environment and ECC scrubbing on the GDDR5 memory used on the GPUs. (AMD's FireStreams support OpenCL and do not have ECC graphics memory.)

The innovative T-Platforms blade servers that MSU is the primary customer for at this point – though that will change soon enough – were designed to support the stripped down Tesla X2070 and X2090 versions of the GPU coprocessors. But rather than wait for the X2090s, which Cray will ship in the third quarter in its XK6 ceepie-geepie hybrid supers, MSU and T-Platforms are going with the X2070s, which have the virtue of being ready to install now rather than waiting until later this year.

T-Platforms MSU data center

Moscow State's Lomonosov supercomputer

The Tesla M2090 fanless and X2090 embedded GPU coprocessors have all 512 cores etched on the Fermi GPUs activated and running at 1.3GHz, with memory running at 1.85GHz, and that yields 665 gigaflops at double-precision and 1.33 teraflops at single-precision with the 178GB/sec of memory bandwidth on the GDDR5 memory. The M2070 and X2070 that started shipping last May have only 448 out of the 512 cores running, and they spin at only 1.15GHz.

GDDR5 memory runs at 1.56GHz and offers only 148GB/sec of bandwidth, which is why the M2070 and X2070 GPU coprocessors are only rated at 515 gigaflops of double-precision and 1.03 teraflops single-precision. All four GPUs have 6GB of graphics memory and plug into PCI-Express 2.0 slots.

The upgrade to the Lomonosov super is a variant of the T-Blade blade server built by T-Platforms that El Reg told you about last September when they debuted. It was T-Platforms that outted Nvidia for even making an X2070 embedded GPU, much as Cray outted the X2090 long before Nvidia was ready to ship it. Based on the pictures available for the upgraded Lomonosov machine, it looks like T-Platforms has tweaked the blade design a bit while keeping the feeds and speeds the same.

T-Platforms T-Blade 2 TL

T-Platforms T-Blade 2 TL blade server

The T-Blade design tips main memory on its side on the memory boards, which means they lay flat. It also allows T-Platforms to cram more blades into a chassis and for heat sinks to be pressed right against the memory modules and other components as a single unit. The T-Blade 2 TL blade uses two of Intel's four-core, low-voltage Xeon L5630 processors, which is a step backward from the six-core Xeon 5670 processors used in Lomonosov before the upgrade.

Main memory in the blade is 12GB, which is half of what was used before the upgrade as well. With the GPUs doing the bulk of the computing, CPU cores and memory can be cut back without hurting overall performance, apparently. The blade has two X2070 GPU co-processors and two ConnectX-2 hybrid InfiniBand/Ethernet adapter cards, each of which have a 40Gb/sec InfiniBand (QDR, or quad data rate) port and a Gigabit Ethernet port, mounted on the blade motherboard.

T Platforms GP Blade Chassis

The hot end of the T-Blade 2 chassis with CPU-GPU blades installed

The T-Blade 2 chassis puts 16 of these nodes and two 36-port QDR InfiniBand switches from Mellanox into a single 7U chassis, with heat sinks sandwiched tightly between the blades and the whole shebang sucking about 12 kilowatts and delivering 17.5 teraflops.

By the way, the GPUs deliver about 16.5 teraflops of that oomph; the Xeons are there mostly to shepherd calculations to the GPUs. Last fall, Alexey Nechuyatov, director of product marketing at T-Platforms, said one of these chassis fully loaded would cost around $300,000.

The newly upgraded Lomonosov has 49 of these 7U chassis lashed together, with a total of 777 blades. (I am not sure why the 49th chassis only has nine blades, but there you have it.) That gives the system 1,554 X2070 GPUs and 6,216 Xeon cores with a total 850.5 teraflops of peak performance.

When you add up the number-crunching power in the 5,100 other Xeon 5500 and 5600 blades (based on the prior generations of T-Blade 1.5 and T-Blade 2 XN blades), that is 510 teraflops peak, and with these nodes all interconnected, you get 1.36 petaflops of aggregate oomph for climate modeling, drug design, industrial hydrodynamics, enzymology, turbulence modeling, and various chemical, physical, and biological simulations to frolic within.

Moscow State isn't the only Russian facility that's looking at GPU coprocessors. Keldysh Institute of Applied Mathematics has 192 Tesla C2050 GPUs doing simulations for atomic energy, aircraft design, and oil extraction. And Lobachevsky State University of Nizhni Novgorod (NNSU), which is Russia's first CUDA research center, is installing a cluster with 100 teraflops of GPU oomph this year and will push that up to 500 teraflops by the end of 2012.

Invading Amerika

With such compute density, T-Platforms said last year that it had a product that it thought it could sell to companies, universities, and laboratories in Western Europe and North America. And to get a toehold in the HPC market in the United States, T-Platforms has inked a reseller agreement with AEON Computing, a supercomputer supplier based in San Diego, California. ®

The Power of One eBook: Top reasons to choose HP BladeSystem

More from The Register

next story
Apple fanbois SCREAM as update BRICKS their Macbook Airs
Ragegasm spills over as firmware upgrade kills machines
Attack of the clones: Oracle's latest Red Hat Linux lookalike arrives
Oracle's Linux boss says Larry's Linux isn't just for Oracle apps anymore
THUD! WD plonks down SIX TERABYTE 'consumer NAS' fatboy
Now that's a LOT of porn or pirated movies. Or, you know, other consumer stuff
EU's top data cops to meet Google, Microsoft et al over 'right to be forgotten'
Plan to hammer out 'coherent' guidelines. Good luck chaps!
US judge: YES, cops or feds so can slurp an ENTIRE Gmail account
Crooks don't have folders labelled 'drug records', opines NY beak
Manic malware Mayhem spreads through Linux, FreeBSD web servers
And how Google could cripple infection rate in a second
FLAPE – the next BIG THING in storage
Find cold data with flash, transmit it from tape
prev story


Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Reducing security risks from open source software
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Consolidation: the foundation for IT and business transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.