Channel

This article is more than 1 year old

Intel puts x64 in a parallel universe

Taking the MIC out of Larrabee

Tue 1 Jun 2010 // 05:24 UTC

No GPU speak required

The important thing about MIC, as far as Intel is concerned, is that the same C, C++, and Fortran compilers and the same developer tools and libraries used by HPC customers who deploy parallel Xeon server clusters will work on the MIC co-processors. There will be different optimizations, of course. But you don't have to speak GPU to make these things work.

At ISC, Skaugen said that the Knights Ferry co-processor used an MIC chip code-named "Aubrey Isle," which you can see below:

You don't need a supercomputer to count the 32 cores on this die. What's weird about this Aubrey Isle chip is how there are seven groups of four cores, and then one group that seems to be scattered around the die near what seems to be interconnect electronics. If I had to guess — and I have to — the vector unit is on top of the die (you can see 16 splotches, each one capable of doing one floating point operation) and the four cores are below them (the squares with the dark edges).

The rest of each core seems to be L1 and L2 cache memory, and it is a fair guess that the coherent L2 cache is made up of a segment of L1 cache that is shared by all the cores. The two horizontal stripes would therefore implement the ring interconnect lashing the cores together. No word on how big this Aubrey Isle beast is or what process it was implemented in, but presumably it is made using Intel's current 32 nanometer processes and is too damned hot to be put into production.

Skaugen said at ISC that Intel will be ramping up production on the Knights Ferry development co-processor throughout 2010. It is the Knights Corner co-processors that will put what was once Larrabee into a proper device aimed at real HPC shops. Knights Corner appears to be the name of the entire device, not the chip, just as Aubrey Isle was the name of the chip used in the Knights Ferry co-processor. The chip inside the Knights Corner co-processor will be implemented in a 22 nanometer process and will have at least 50 of the x64 cores on them, plus an unspecified number of vector processors. It is fairly likely that Intel is designing 64 cores onto the chip, and then — yields being what they are on massive chips — cores with boogers in them will be deactivated and customers will get what they get.

Similarly, Nvidia's "Fermi" graphics co-processors were designed with 512 cores, but when the machines came out earlier this year, the yields were such that Nvidia could only pump out chips with 448 working cores. The flops were more or less the same, probably because Nvidia cranked up the clocks, which it could do with 12.5 per cent of the cores being duds.

Intel was pretty vague about what kind of performance to expect from the Knights family of GPUs, but you can bet the marketing angle is that Intel does not have to offer the same single- or double-precision flops as AMD or Nvidia with its graphics co-processors because of the ease of programming that comes from using a co-processor based on the x64 instruction set. Intel was bragging at ISC that researchers at CERN were able to port a "complex C++ parallel benchmark" to the MIC software stack and experimental processor in "just a few days."

In its press release about the Knights family of co-processors, Intel said that the MIC architecture would accelerate "select highly parallel applications" but that the "vast majority of workloads will still run best on award-winning Intel Xeon processors." Particularly, a cynic would say, if Intel tries to charge $10,000 for one of these Knights. Then it is checkmate for the whole idea. In any event, Intel won't have the full MIC software development kit ready until sometime in the second half of 2010, and other chips, like the "Ivy Bridge " Xeons, are most likely at the front of the line for the 22 nanometer wafer baking next year.

By the way, Skaugen said at ISC that the next-generation "Sandy Bridge" Xeons, due by the end of the year, would have "significantly greater performance" than the current Westmere and Nehalem Xeons, with higher core counts and HyperThreading boosting performance. With the addition of new AVX vector math instructions, Sandy Bridge Xeons will be able to process twice the flops per clock as the current Xeons, in fact.

The word on the street is that Sandy Bridge Xeons will have 4, 6, or 8 cores and clock speeds of between 2.8 GHz and 3.4 GHz, not including Turbo Boost overclocking. With the AVX units, Sandy Bridge chips will do eight double-precision flops per clock per core, so call it 192 gigaflops with eight cores running at 3 GHz and assume we are talking a 130-watt power envelope.

By comparison, a Knights co-processor with 50 cores and running at maybe 1.5 GHz could have as much as 2 teraflops of single-precision floating point performance. It is unclear if it will be able to run double precision calculations with any speed, but such a chip supporting 1 teraflops of double precision oomph would be compelling to a lot of HPC shops with lots of x64 code. ®

Topics

Special Features

Vendor Voice

Resources

Channel

Intel puts x64 in a parallel universe

Taking the MIC out of Larrabee

No GPU speak required

More about

More about

Narrower topics

Broader topics

More about

More about

More about

Narrower topics

Broader topics

TIP US OFF

Other stories you might like

Intel Gaudi's third and final hurrah is an AI accelerator built to best Nvidia's H100

Intel's neuromorphic 'owl brain' swoops into Sandia labs

Intel's effort to build a foundry biz is costing far more – and taking longer – than expected

Getting on board with AI

Intel Foundry ticks another box in quest to fab mil-spec chips for US DoD

US lawmakers rage over Intel Meteor Lake-powered Huawei PC

Intel over the Moon as Lunar Lake’s NPU performance TOPS Meteor Lake

Microsoft foresees a new type of AI PC: A Surface designed with help from machines

AI cloud startup TensorWave bets AMD can beat Nvidia

Los Alamos Lab powers up Nvidia-laden Venado supercomputer

Intel preps export-friendly lower-power Gaudi 3 AI chips for China

Intel fuels Huawei's AI PC ambitions with Meteor Lake CPUs in MateBook X Pro

About Us

Our Websites

Your Privacy