Fujitsu parades 16-core Sparc64 super stunner

Top of the FLOPS

Maximizing your infrastructure through virtualization

SC11 Ahead of the SC11 supercomputer conference in Seattle last week, recently awakened supercomputing giant Fujitsu rolled out the kicker: a commercialized version of the K supercomputer that is at the top of the flops charts in the world right now.

A whole lot of details on the Sparc64-IXfx processor and the PrimeHPC FX10 systems were missing, but El Reg has chased them down just as Fujitsu has announced its first paying customer for the FX10 machines.

The K supercomputer is the first machine in the world to break through the 10 petaflops performance barrier as gauged by the Linpack Fortran benchmark test. It was built by Fujitsu for the Japanese government and is installed at the Rikagaku Kenkyusho (RIKEN) research lab in Kobe, Japan.

The K super is based on the "Venus" Sparc64-VIIIfx processor designed by Fujitsu and fabbed by Taiwan Semiconductor Manufacturing Corp. The eight-core Venus chip clocks at 2GHz and delivers 128 gigaflops per chip, has a thermal efficiency of around 2.2 gigaflops per watt, and dissipates around 58 watts.

Fujitsu PrimeHPC FX10 small

Some nodes of Fujitsu's PrimeHPC FX10 supercomputer

The K super has 22,032 four-socket blade servers fitted into 864 server racks to bring 705,024 cores to bear on parallel computation jobs. Running Linpack, the K machine delivered 10.51 teraflops of sustained performance on the Linpack test, which is 93.2 per cent efficiency as lined up against its peak theoretical performance of 11.28 teraflops. The Torus Fusion, or Tofu, 6D mesh/torus interconnect that Fujitsu has cooked up is no doubt one of the secret sauces in the K and FX10 supers.

The PrimeHPC FX10 super uses double-stuffed 16-core Sparc64 processors, also designed by Fujitsu and fabbed by TSMC, and increases the rack count to 1,024.

Most of the feeds and speeds of the Sparc64-IXfx processor were not available two weeks ago when Fujitsu jumped the gun on the SC11 conference. We knew that the chip has 16 Sparc cores that run at 1.85GHz and delivers 236 gigaflops of double-precision floating point number crunching. Now we know what the chip looks like and some more stuff about it.

Fujitsu's Sparc64-IXfx processor

Fujitsu's Sparc64-IXfx processor (click to enlarge)

The Sparc64-IXfx chip has 85GB/sec of memory bandwidth and includes 12MB of L2 cache memory on the chip that is shared by all 16 of those cores. Fujitsu is not implementing a ring interconnect for those cores, as Intel is doing for future Xeon and Itanium processors, but rather is plunking a big L2 cache memory controller in the dead center of the chip and wrapping four banks of L2 cache memory around it. Two banks of cores are on the chip, top and bottom, with a DDR3 main memory controller implemented on each side of the L2 cache banks with memory interfaces out to the memory DIMMs.

The cores on the Sparc64-IXfx processor have 32KB of L1 data cache and 32KB of L1 instruction cache. The core has two integer units, two load/store units, and four floating point units that can execute two add or multiply instructions per clock. The chip can also allow a fat SIMD instruction to span two floating point units. The 16-core chip can do 128 floating point operations per clock, and at just a hair under 1.85GHz, you get 236 gigaflops peak theoretical performance.

The Sparc64-IXfx chip is implemented in a 40 nanometer process from TSMC and the die is nearly perfectly square at 21.9 millimeters by 22.1 millimeters. The chip has 1.87 billion transistors and 1,442 signal pins. During normal operations, Fujitsu says that the Sparc64-IXfx processor will burn about 110 watts.

At the top of the chip is an interface to the Tofu interconnect. Each processor socket in the K or FX10 machine has one of its own Tofu interconnect chips. This interconnect chip has a processor bus to link back to the Sparc64-IXfx processor, four Tofu network interfaces that handle packets coming off the processor and also provides remote direct memory access (RDMA) like InfiniBand does.

The interconnect chip has a Tofu barrier interface that handles collective operations, and a Tofu network router that has ten Tofu links. These links are used to hook the Tofu interconnect chips to up to ten other interconnect chips in the cluster, implementing the 6D mesh/torus when all the links are used.

The interconnect chip also has a PCI-Express 2.0 peripheral controller for linking out to storage and other peripherals. The interconnect chip is implemented in a fairly ancient 65 nanometer process and runs at 312.5MHz, which is a little less than one sixth the clock speed of the processor, and has ten bi-directional ports running at 5GB/sec this delivering a peak of 100GB/sec of switching capacity.

You have to think that Fujitsu wants to put the Tofu controller on the future Sparc64-Xfx processor, if there is such a thing. Or at least get it on the same chip package to further increase the density of the PrimeHPC clusters.

Fujitsu PrimeHPC FX10 blade

The PrimeHPC blade server with Tofu interconnect chips on the left

As with the K supers, there are four Sparc64-IXfx processors on each blade in the FX10 machine, with four matching Tofu interconnect chips. All eight chips on the blade are cooled with water blocks, which are attached to rear-door water jackets on the PrimeHPC racks.

The base PrimeHPC FX10 machine has 64 racks, as it turns out, and a loaded up rack costs about for ¥50m, or about $650,000 (£414,000), each. Those 64 racks have 6,144 compute nodes (four per blade) with 384TB of memory and 1.4 petaflops of peak number-crunching power; this configuration also has 384 I/O nodes, which have a total of 1,536 expansion slots.

This machine has about the same power efficiency as the K super, and burns 1.4 megawatts. A fully loaded 1,024-rack system would have 98,304 compute nodes, 6PB of main memory, and deliver 23 petaflops of oomph while burning 23 megawatts. Such a box would cost $655.4m at list price, but we're pretty sure Fujitsu will cut you a deal.

Fujitsu is ready to ship the PrimeHPC FX10 machines starting in January 2012, and the University of Tokyo's supercomputing division is the first customer to buy a PrimeHPC FX10 machine. The university is buying a 50-rack setup with 4,800 Sparc64-IXfx nodes with 150TB of memory and 1.13 petaflops of oomph. The FX10 machine at the University of Tokyo is front-ended by 16 Primergy RX200 S6 and 58 Primergy RX300 S6 servers that are being used as access controllers to the 1.13 petaflops monster.

The cluster is backed by 150 Eternus DX80 S2 RAID 5 storage arrays with 1.1PB of capacity, which are connected to the nodes directly, and 80 Eternus DX410 S2 arrays that are implemented using RAID 6 protection across their collective 2.1PB of capacity and shared by all nodes in the cluster.

The whole shebang runs the Fujitsu Exabyte File System, which also made its debut ahead of the SC11 show. FEFS is a variant of the open-source Lustre file system, and Fujitsu has committed to giving its enhancements to Lustre back to the community through a partnership with Whamcloud.

The latter company is offering third-party support for Lustre, which is technically controlled by Oracle since its acquisition of Sun Microsystems nearly two years ago. But Oracle doesn't care about HPC and therefore Whamcloud has forked Lustre and is offering support services to keep the big supercomputing labs of the world happy.

Fujitsu said it wanted to sell 50 of the PrimeHPC FX10 systems in the next three years, predominantly as a development machine for institutions that want to deploy applications on the K machine. One down, 49 to go. ®

The Power of One eBook: Top reasons to choose HP BladeSystem

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
The triumph of VVOL: Everyone's jumping into bed with VMware
'Bandwagon'? Yes, we're on it and so what, say big dogs
Carbon tax repeal won't see data centre operators cut prices
Rackspace says electricity isn't a major cost, Equinix promises 'no levy'
Disaster Recovery upstart joins DR 'as a service' gang
Quorum joins the aaS crowd with DRaaS offering
prev story


Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Application security programs and practises
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Securing Web Applications Made Simple and Scalable
Learn how automated security testing can provide a simple and scalable way to protect your web applications.