T-Platforms CPU-GPU hybrid hits 1.3 petaflops at Moscow State

Russian super maker invades Amerika

Remote control for virtualized desktops

Moscow State University has moved into the upper echelons of the HPC field with an upgrade to its top-end supercomputer and moved to hybrid CPU-GPU blade servers from indigenous supercomputer maker T-Platforms.

It comes as no surprise that MSU has bulked up the math skills of the supercomputer, which is named after 18th century Russian polymath Mikhail Lomonosov, with Tesla GPU coprocessors from Nvidia. The Tesla GPU coprocessors, which are powered by the 512-core "Fermi" GPUs that are also used in Nvidia's graphics cards, are vastly preferred over FireStream alternatives from Advanced Micro Devices thanks to the CUDA programming environment and ECC scrubbing on the GDDR5 memory used on the GPUs. (AMD's FireStreams support OpenCL and do not have ECC graphics memory.)

The innovative T-Platforms blade servers that MSU is the primary customer for at this point – though that will change soon enough – were designed to support the stripped down Tesla X2070 and X2090 versions of the GPU coprocessors. But rather than wait for the X2090s, which Cray will ship in the third quarter in its XK6 ceepie-geepie hybrid supers, MSU and T-Platforms are going with the X2070s, which have the virtue of being ready to install now rather than waiting until later this year.

T-Platforms MSU data center

Moscow State's Lomonosov supercomputer

The Tesla M2090 fanless and X2090 embedded GPU coprocessors have all 512 cores etched on the Fermi GPUs activated and running at 1.3GHz, with memory running at 1.85GHz, and that yields 665 gigaflops at double-precision and 1.33 teraflops at single-precision with the 178GB/sec of memory bandwidth on the GDDR5 memory. The M2070 and X2070 that started shipping last May have only 448 out of the 512 cores running, and they spin at only 1.15GHz.

GDDR5 memory runs at 1.56GHz and offers only 148GB/sec of bandwidth, which is why the M2070 and X2070 GPU coprocessors are only rated at 515 gigaflops of double-precision and 1.03 teraflops single-precision. All four GPUs have 6GB of graphics memory and plug into PCI-Express 2.0 slots.

The upgrade to the Lomonosov super is a variant of the T-Blade blade server built by T-Platforms that El Reg told you about last September when they debuted. It was T-Platforms that outted Nvidia for even making an X2070 embedded GPU, much as Cray outted the X2090 long before Nvidia was ready to ship it. Based on the pictures available for the upgraded Lomonosov machine, it looks like T-Platforms has tweaked the blade design a bit while keeping the feeds and speeds the same.

T-Platforms T-Blade 2 TL

T-Platforms T-Blade 2 TL blade server

The T-Blade design tips main memory on its side on the memory boards, which means they lay flat. It also allows T-Platforms to cram more blades into a chassis and for heat sinks to be pressed right against the memory modules and other components as a single unit. The T-Blade 2 TL blade uses two of Intel's four-core, low-voltage Xeon L5630 processors, which is a step backward from the six-core Xeon 5670 processors used in Lomonosov before the upgrade.

Main memory in the blade is 12GB, which is half of what was used before the upgrade as well. With the GPUs doing the bulk of the computing, CPU cores and memory can be cut back without hurting overall performance, apparently. The blade has two X2070 GPU co-processors and two ConnectX-2 hybrid InfiniBand/Ethernet adapter cards, each of which have a 40Gb/sec InfiniBand (QDR, or quad data rate) port and a Gigabit Ethernet port, mounted on the blade motherboard.

T Platforms GP Blade Chassis

The hot end of the T-Blade 2 chassis with CPU-GPU blades installed

The T-Blade 2 chassis puts 16 of these nodes and two 36-port QDR InfiniBand switches from Mellanox into a single 7U chassis, with heat sinks sandwiched tightly between the blades and the whole shebang sucking about 12 kilowatts and delivering 17.5 teraflops.

By the way, the GPUs deliver about 16.5 teraflops of that oomph; the Xeons are there mostly to shepherd calculations to the GPUs. Last fall, Alexey Nechuyatov, director of product marketing at T-Platforms, said one of these chassis fully loaded would cost around $300,000.

The newly upgraded Lomonosov has 49 of these 7U chassis lashed together, with a total of 777 blades. (I am not sure why the 49th chassis only has nine blades, but there you have it.) That gives the system 1,554 X2070 GPUs and 6,216 Xeon cores with a total 850.5 teraflops of peak performance.

When you add up the number-crunching power in the 5,100 other Xeon 5500 and 5600 blades (based on the prior generations of T-Blade 1.5 and T-Blade 2 XN blades), that is 510 teraflops peak, and with these nodes all interconnected, you get 1.36 petaflops of aggregate oomph for climate modeling, drug design, industrial hydrodynamics, enzymology, turbulence modeling, and various chemical, physical, and biological simulations to frolic within.

Moscow State isn't the only Russian facility that's looking at GPU coprocessors. Keldysh Institute of Applied Mathematics has 192 Tesla C2050 GPUs doing simulations for atomic energy, aircraft design, and oil extraction. And Lobachevsky State University of Nizhni Novgorod (NNSU), which is Russia's first CUDA research center, is installing a cluster with 100 teraflops of GPU oomph this year and will push that up to 500 teraflops by the end of 2012.

Invading Amerika

With such compute density, T-Platforms said last year that it had a product that it thought it could sell to companies, universities, and laboratories in Western Europe and North America. And to get a toehold in the HPC market in the United States, T-Platforms has inked a reseller agreement with AEON Computing, a supercomputer supplier based in San Diego, California. ®

Beginner's guide to SSL certificates

More from The Register

next story
NSA SOURCE CODE LEAK: Information slurp tools to appear online
Now you can run your own intelligence agency
Fat fingered geo-block kept Aussies in the dark
NASA launches new climate model at SC14
75 days of supercomputing later ...
Yahoo! blames! MONSTER! email! OUTAGE! on! CUT! CABLE! bungle!
Weekend woe for BT as telco struggles to restore service
Cloud unicorns are extinct so DiData cloud mess was YOUR fault
Applications need to be built to handle TITSUP incidents
BOFH: WHERE did this 'fax-enabled' printer UPGRADE come from?
Don't worry about that cable, it's part of the config
Stop the IoT revolution! We need to figure out packet sizes first
Researchers test 802.15.4 and find we know nuh-think! about large scale sensor network ops
SanDisk vows: We'll have a 16TB SSD WHOPPER by 2016
Flash WORM has a serious use for archived photos and videos
Astro-boffins start opening universe simulation data
Got a supercomputer? Want to simulate a universe? Here you go
prev story


Go beyond APM with real-time IT operations analytics
How IT operations teams can harness the wealth of wire data already flowing through their environment for real-time operational intelligence.
10 threats to successful enterprise endpoint backup
10 threats to a successful backup including issues with BYOD, slow backups and ineffective security.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security and trust: The backbone of doing business over the internet
Explores the current state of website security and the contributions Symantec is making to help organizations protect critical data and build trust with customers.