Feeds

3D processor-memory mashups take center stage

'I have seen the future, and it is stacked'

Internet Security Threat Report 2014

How low can you go?

The University of Michigan's "Centip3De: A 3930DMIPS/W configurable near-threshold 3D stacked system with 64 ARM Cortex-M3 cores: The UoM's oh-so-cutely named Centip3De takes 3D chippery in a different direction – that of near-threshold computing (NTC).

NTC's focus is not to crank up processors with a boatload of juice in order to get their transistors switching at high frequencies, but just the opposite: to use just enough power to carry them over their operating-voltage threshold.

The advantage of NTC is clear: less power consumption – especially important if you're stacking compute and memory layers in the same chip, and don't want to watch the entire assemblage melt before your very eyes.

The disadvantage is equally clear: at low voltages, transistors switch slowly. However, if you have a large number of transistors in a large number of compute cores working on a highly parallelized workload, the voltage-supply math can work in your favor.

"By running at a lower voltage, we can have a higher energy efficiency and we can regain some of that performance loss by having many layers of silicon," UoM PhD student David Fick told his ISSCC audience.

One problem with NTC is that the ideal – most efficient – operating voltage for a compute core is lower than that required for its associated cache memory. The Centip3De solves this problem by running the cache memory at four times the clock of the compute cores – but cleverly clusters four cores per cache unit, and manages the cache distribution among them.

Slide from ISSCC paper, 'Centip3De: A 3930DMIPS/W Configurable Near-Threshold 3D Stacked System with 64 ARM Cortex-M3 Cores'

The current Centip3De is a two-layer prototype, but the team plans a seven-layer future (click to enlarge)

For example, if each core is running at 10MHz, as Fick showed in one example, the cache could run at 40MHz. The cores each see a single L1 cache, and the clustering allows them to share it at their own core operating frequency with single-cycle latency.

What's more, the Centip3De's cache design also allows one core to take over more cache space, should it need it, as long as another core's cache space could be reduced. There could conceivably be core data conflicts within the cluster, but Fick says that their team's architectural simulations had shown that "this was not a dominant effect."

In addition, cores could be entirely shut down – dynamically, of course – and their power could be passed to another core, thus increasing their frequency. You could, for example, have four cores in a cluster running at 10MHz each, or one at 40MHz, depending upon the needs of the workload. Entire clusters can be shut down, as well, and their power shunted to adjacent clusters.

Slide from ISSCC Paper, 'Centip3De: A 3930DMIPS/W Configurable Near-Threshold 3D Stacked System with 64 ARM Cortex-M3 Cores'

Today's two-layer Centip3De processor and DRAM layers are of different process sizes (click to enlarge)

The current Centip3De chip was built using a 130nm process. The paper presented at ISSCC says that if the cores running at 10MHz in the prototype chip were baked using a 45nm SOI CMOS process, that'd translate to 45MHz per core. Fick told his audience that if the process were scaled to 32nm, those 10MHz cores could operate at 110MHz.

Those higher clock speeds would, of course, be throttled down if the compute cores were operated at near-threshold voltages. *

Top 5 reasons to deploy VMware with Tegile

More from The Register

next story
Azure TITSUP caused by INFINITE LOOP
Fat fingered geo-block kept Aussies in the dark
NASA launches new climate model at SC14
75 days of supercomputing later ...
Yahoo! blames! MONSTER! email! OUTAGE! on! CUT! CABLE! bungle!
Weekend woe for BT as telco struggles to restore service
You think the CLOUD's insecure? It's BETTER than UK.GOV's DATA CENTRES
We don't even know where some of them ARE – Maude
Cloud unicorns are extinct so DiData cloud mess was YOUR fault
Applications need to be built to handle TITSUP incidents
BOFH: WHERE did this 'fax-enabled' printer UPGRADE come from?
Don't worry about that cable, it's part of the config
Stop the IoT revolution! We need to figure out packet sizes first
Researchers test 802.15.4 and find we know nuh-think! about large scale sensor network ops
DEATH by COMMENTS: WordPress XSS vuln is BIGGEST for YEARS
Trio of XSS turns attackers into admins
prev story

Whitepapers

Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Seattle children’s accelerates Citrix login times by 500% with cross-tier insight
Seattle Children’s is a leading research hospital with a large and growing Citrix XenDesktop deployment. See how they used ExtraHop to accelerate launch times.
5 critical considerations for enterprise cloud backup
Key considerations when evaluating cloud backup solutions to ensure adequate protection security and availability of enterprise data.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?