Feeds

UV 2: RETURN of the 'Big Brain'. This time, it's affordable

Hefty loads bursting out of your box? Try this

High performance access to file storage

Silicon Graphics is betting big on Intel's latest Xeon E5-4600 processor and its own revved up NUMAlink 6 shared memory interconnect, creating a "big brain computer" that can gang up to 4,096 cores into a single system image to run massive Linux workloads and fairly large Windows jobs, too. The new UV 2 is exactly the kind of box, says SGI, that customers with big data warehouse, big database, big data, and traditional HPC workloads have always wanted – and in many cases could never have afforded.

But the shift to new packaging and lower-cost Xeon E5 processors from Itanium and then Xeon E7 chips from Intel have made the shared memory systems from SGI more broadly accessible at just the same time that many workloads seem to be busting out of general-purpose four-socket boxes. This is good news for SGI, which has had its share of financial woes as it chases the capricious and fiercely competitive HPC and hyperscale data center markets.

SGI will also be pleased to note that Intel has not yet got interconnect fabrics woven into its Xeon processors and chipsets, although it is clearly working on that with the acquisition of Cray's family of HPC interconnects back in April, its purchase of the InfiniBand chip and switch business from QLogic in January, and the Ethernet switch chip business Fulcrum Microsystems back in July 2011.

However, SGI still has a good window in which to capitalize on its NUMAlink interconnect before Intel does whatever it's going to do to integrate interconnects with its CPUs and chipsets. It would not be surprising to see SGI sell the NUMAlink biz to Intel for a big chunk of change, or maybe even an acquisitive Advanced Micro Devices or Hewlett-Packard. In fact, it would not be surprising at all if HP just upped and bought SGI to get out of its Itanium conundrum with Oracle. But so far, SGI seems content to go it alone and to peddle rack and shared memory systems all by its lonesome.

A rack's worth of SGI's UV 2000 supercomputer

A rack's worth of SGI's UV 2000 supercomputer

SGI put out a bit of a preview on the UV 2 lineup when Intel launched the Xeon E5-4600 processors a little more than a month ago. At the time, the company said that it was switching away from the Xeon 7500 and E7 and their multiple QuickPath Interconnect (QPI) ports. SGI had also said it was moving away from the "Boxboro" 7500 chipset that it had used to interface with the NUMAlink 5 interconnect for lashing nodes tightly together in a memory-coherent fashion. The UV 1000 high-end machines were based on a two-socket blade.

The Xeon 7500 and E7 chips have four QPI ports coming off each socket, and the original UV 1000 design used two QPI ports on the Xeon 75000 or E7 chips to cross-link the two sockets together, with one of the remaining two QPI ports going to the Boxboro chipset (which controls access to main memory and local I/O slots on the blade) and the other that links out to the NUMAlink 5 hub, which in turn has four links out to the NUMAlink 5 router. That router implements an 8x8 (paired node) 2D torus that can deliver up to 16TB of shared space across those 256 sockets.

While SGI let it be known a month ago that it was ditching the Xeon E7s for the E5-4600s in the next-generation UV 2000 shared memory supers, the company did not say exactly how it was going to build these machines. (SGI had to save a little something to talk about at the International Super Computing conference in Hamburg, Germany this week, after all.) El Reg speculated that there would be a goosed interconnect and that SGI would stick to two-socket blades. We were right on the first count, but because there are two fewer QPI ports on the Xeon E5-4600 than on the Xeon 7500 and E7, the bandwidth between the ports would have been significantly diminished. It was easier and cleaner to make what is in effect a microserver and use the QPI ports to double up out to the new NUMAlink 6 interconnect hub, and that is what SGI has done.

SGI would have no doubt preferred to build the original UV 1000 machines, which debuted in November 2009 and which spanned 128 blades and 256 sockets in a shared memory configuration, using cheaper Xeon 5500 and 5600 processors. But these chips have only one QPI port coming off their sockets and their on-chip memory controllers cannot address as much memory as the Xeon 7500s and E7s, so SGI had no choice but to use the fat Xeons in 2009 and await the less expensive E5-4600s here in 2012.

The memory expansion on the E5-4600 chip is the key to the rejiggered UV 2000 machine, since each processor socket can currently hold a dozen memory slots and address up to 384GB of memory without any external memory buffers or funky chipsets. But the real secret sauce in the UV 2000 is the NUMAlink 6 interconnect, which is a substantial re-engineering of the NUMAlink 5 interconnect that offers about 2.5 times the bandwidth and a much simpler system design as well.

Jill Matzke, director of server marketing at SGI, says that with the NUMAlink 6, a bunch of different things happened all at once. First, SGI's chip fab partner, Avago Technologies, did a process shrink, allowing for more stuff to be crammed onto the chip. (Avago, which is a spinout of Agilent Technologies, itself a spinout from Hewlett-Packard, doesn’t actually make the NUMAlink chips; a fab in Taiwan does.) So SGI could take two of the NUMAlink hubs and put them onto a single chip. SGI could also bring the NUMAlink router onto the ASIC for the first time. Equally important, some of the functions that had been performed by the NUMAlink hub and router using the Xeon 7500 and E7 chips are now done by the Xeon E5s themselves; PCI-Express controllers are one new on-chip function. This is a much simpler set of NUMAlink ASICs. (And you can see now why Intel wants to control the interconnects.)

With the UV 1000 design, there was a node controller in the blade chassis – which the nodes in the chassis shared – and a NUMAlink router at the top of the rack. With the UV 2000, more of the router functionality is contained in that NUMAlink hub/node controller that is on the system board and the node controllers are doubled up for bandwidth. You can scale across two racks of UV 2000 machines without using an external top-of-rack router.

But, says Matzke, if you want to add extra bandwidth across those E5-4600 unisocket blades, you can add NUMAlink 6 routers at the top of the racks, too. This allows customers to dial up the CPU and bandwidth scalability independently of each other with the UV 2000, something you could not do with the UV 1000. The NUMAlink 6 interconnect provides 6.7Gb/sec of bi-sectional bandwidth.

A blade server from the UV2 super

A blade server from the UV2 super (click to enlarge)

The basic node on the UV 2000 has two single-socket servers with a vertical extender card sandwiched between the two stacked motherboards and linking them together with a NUMAlink 6 hub chip. This packaging is similar, in concept, to the "Gemini" blade used in the ICE X Xeon E5-2600 clusters that were previewed last November at SC11 and that started shipping in March of this year. A 10U chassis holds eight half-width nodes, with up to 128 cores and 4TB of memory. A single rack has four of these, for up to 512 cores and 16TB of memory; and a fully loaded UV 2000 has eight racks for a total 2,048 cores and 64TB of global shared memory. If Intel had switched on one more bit in the E5-4600 memory controller, SGI could have pushed the memory up to the full 128TB of memory it is physically possible to put in the 512 nodes in the fully loaded UV 2000 machine. But it didn't, so you can't.

High performance access to file storage

More from The Register

next story
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
Audio fans, prepare yourself for the Second Coming ... of Blu-ray
High Fidelity Pure Audio – is this what your ears have been waiting for?
Did a date calculation bug just cost hard-up Co-op Bank £110m?
And just when Brit banking org needs £400m to stay afloat
Sorry London, Europe's top tech city is Munich
New 'Atlas of ICT Activity' finds innovation isn't happening at Silicon Roundabout
MtGox chief Karpelès refuses to come to US for g-men's grilling
Bitcoin baron says he needs another lawyer for FinCEN chat
Zucker punched: Google gobbles Facebook-wooed Titan Aerospace
Up, up and away in my beautiful balloon flying broadband-bot
Apple DOMINATES the Valley, rakes in more profit than Google, HP, Intel, Cisco COMBINED
Cook & Co. also pay more taxes than those four worthies PLUS eBay and Oracle
prev story

Whitepapers

Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.