More like this

Data Center



3D processor-memory mashups take center stage

'I have seen the future, and it is stacked'

ISSCC A trio of devices that stack layers of compute units and memory in a single chip to boost interconnect bandwidth were presented at this week's International Solid-State Circuits Conference in San Francisco.

Sharing the stage at the ISSCC's High Performance Digital session were three technologies; one prototype developed by IBM that places cache memory layers on top of a "processor proxy" layer, and two working chips – one developed at the University of Michigan, and another by the Georgia Institute of Technology working with KAIST and Amkor Technology, both in South Korea.

Note that these parts aren't merely RAM-stacked-on-top-of-a-processor packages such as, for example, Apple's A5. These are single parts with processor and memory closely coupled, married together in a single slab.

The ISSCC presentations each were titled in impressive boffin-speak – so impressive that we'll quote the title of each paper before we dig into a few of its details.

IBM's "3D system prototype of an eDRAM cache stacked over processor-like logic using through-silicon vias": Like the other two chips, the IBM prototype routes data, clock, and power signals through its layers – what IBM calls "strata" – by means of through-silicon vias (TSVs).

TSVs are essentaily just what they sound like: signal paths that are etched through a silicon layer and filled with a conductor. In IBM's prototype, the TSVs are copper-filled, and are about 20 micrometers (0.0008 inches) in diameter.

A 3D System Prototype of an eDRAM Cache Stacked Over Processor-Like Logic Using Through-Silicon Vias

IBM's TSVs are connected layer-by-layer with tiny conductive balls. (click to enlarge)

The prototype that IBM presented at ISSCC was a two-strata affair, but the design is intended to be extendable to more cache-memory strata. The design of those cache strata borrows heavily from the Power7's integrated L3 cache, including its embedded DRAM (eDRAM), IP library, logic macros, and design and test flow.

IBM didn't use a true processor for the base of its stack, but instead a proxy for test purposes only, which included circuits to exercise the memory and emulates the noise and power of a true processor up to 350 watts per square centimenter.

Slide from IBM's ISSCC paper, 'A 3D System Prototype of an eDRAM Cache Stacked Over Processor-Like Logic Using Through-Silicon Vias'

Unlike the two other larger-process 3D chips presented, IBM's is built at a snug 45nm

That high power level would be needed in the target four-strata design, seeing as how IBM says that the design's clock skew is estimated to be less than 13 picoseconds in a four-strata design, which would allow a "worst-case" L3 clock frequency of 2GHz, resulting in a data bandwidth of 450 gigabits per second.

Sponsored: Best practices for writing a successful NSF MRI grant proposal

Next page: How low can you go?