IBM chums with Swiss to build 3D brain-density processors
'Without them, IT will gobble entire world power supply'
Boffins in Switzerland have warned that increasingly powerful computer processors are set to guzzle the entire world electricity supply by the year 2100. They say that only 3D myria-core chips can save the day.
Getting on top of Moore's law.
"Industry’s data centres already consume as much as 2% of available electricity," says John R Thome of the École Polytechnique Fédérale de Lausanne (EPFL).
"As consumption doubles over a five-year period, the supercomputers of 2100 would theoretically use up the whole of the electrical supply!”
According to Thome, the answer is to expand on today's multicore processors by building three-dimensional arrays of cores - rather than just laying them out on a sheet. Boffins at the EPFL have allied with others at the Eidgenössische Technische Hochschule in Zürich (ETH Zurich) and from IBM's Swiss lab at Rüschlikon to conduct the CMOSAIC project, which is aimed at delivering processors with as many transistors per cm3 as there are neurons in the human brain. IBM has just come on board the Swiss government-funded effort.
According to Thome and his colleagues, 3D processors will be much more energy-efficient than ordinary flat ones, so bowling out the issue of using up all the world's power supplies. But they'll still heat up - and air cooling won't be good enough.
The boffins plan to build their 3D devices with a network of 50-micron cooling pipes running through them, about the thickness of a human hair. The pipes will carry liquid coolant, which will be heated into a vapour by the hot cores. It will then be condensed, cooled and recirculated.
The assembled Swiss boffins expect to see their first 3D integrated circuitry going into supercomputers around 2015, and the first ones fitted with the new liquid-chill gear in 2020.
There's more on the CMOSAIC project here. ®
I see the 80s are back again.
Slots in the back face of microchips for cooling. See the Petersen review article ("Silicon as a [micro?]mechanical material) IEEE. stacked processing chips would be the Hughes airborne computer efforts. IIRC the ultimate goal was a sensor layer (UV, IR, vis whatever) with multiple layers underneath for memory and processing. The plan was to use spots of tin on the chip surface and a front to back temp gradient to drive in the tin and create a front to back conductive channel.
They might also look at what happened when Gene AMdahl set up "Trilogy" to do wafer scale integration. Teh eky problem (which broke them) was the inability to find a way to make wafers which could cut the failed sections out of path and leave the rest running, at a reasonable cost.
Note that current liquid cooling methods, most of which seem to use water in a heat pipe arrangement, which is controlled boiling.
I would suggest that all of this is a bit of a red herring. The ultimate source of *most* of that heat are the clock driver transistors to distriibute the upteen Ghz system clock, *regardless* of wheather or not that particular section is actually even operating.
Putting more chips together in closer space merely means they will swast even more energy in a confined space.
If you want lower power you implement asynchronous (clockless) systems l ike the manchester ARM develeopments or the design libraries of Phillips.
People will only start looking at this when soemwone works out a way to sell clockless processors while differentiating different grades (IE cost) based on some sort of parameter people can compare. Some kind of agreed "throughput" measure would be reasonable, but the time from reading some values from off-chip RAM to writing (to off-chip RAM) is likely to rather longer than the sub-ns duration of a clock cycle.
AFAIK the limitations on speed up bought by parellel processing have not gone a way. Somewhere in the 10-16 processor area is where the shared memory approach hits the skids. There were very good reasons why the transputer was conceived as it was. People who ignore them are asking for trouble.
As for "Neuronal density," individual transistors are long past that in the 2d sense. The *huge* relative thickness of dead silicon substrate is the problem here. All the real action is within roughly <5microns of the surface.
And of course making a unit mimic the actual action of a true net of neurons and axons is another matter.
Mines the one with Carver Mead's Analogue VLSI circuit design in it.
Why not use R410A or other non conductive, non toxic, non corrossive industrial refrigerant and rid themselves of the pipes. Just pump the coolant in one side of the stack, let it flow while evaporating and collect the resultant hot gas on the other side. No pipes, just small expanding holes etched in the surface of the bottom chip, one quick connect pipe socket in each side of the module, and you're ready to go.
Use one compressor for all paralel chips, and get the efficiencies in moving heat associated with het pump aircon systems.
sharp enough to cut yourself
You raise some good points, but some of your points seem to be based on slightly outdated information.
1) Problems re wafer scale integration: We're now at the point where very few large dies are defect free. Most chip houses already have technology that allows them to blow fuses and disable whichever cores/caches have defects. Then they bin the parts and set prices accordingly. Potentially 3-d technology would allow the use of smaller dies, helping improve the relative proportion of perfect dies, which could possibly even simplify matters.
2) Energy consumption and clocks: Clock gating is now fairly established technology. Switching power drain is actually becoming less important compared to leakage power, and there are strides being made there too (e.g. through silicon on insulator technology and ground/power gating). Again, 3-d technology might help here. If the same sort of die is stacked vertically, circuit designers might be able to take advantage of vertically stacked units to reduce the size of their clock tree relative to a chip with units side-by-side (e.g. use a single layer clock grid for the entire chip, then have vertical taps down for every register-- I realize it's probably not exactly that simple). If the total wire length for the clock tree can be reduced, the capacitance should drop and with it, the energy consumption.
3) Asynchronous processors and lack of adoption: Use of clock-rate as a metric for selling parts has been steadily fazing out over the last several years. The real reason we're not seeing asynchronous processors is because they are very very difficult to design and hard to test. Given the complexity of a billion plus transistor system and the number of engineers required, this is a killer. I guarantee that if someone is able to develop methodologies that simplify asynchronous design and test to the same levels as synchronous design, the advantages in power consumption and performance will win the day.
4) Parallel processing and limitations on scaling: It all depends on what you want to do. GPUs are massively parallel, and because of the problem domain, can take advantage of it. I would suggest that as we increasingly store and record multimedia content and expect our devices to deal with it in an intelligent manner (e.g. "Computer, find me the picture of me 'n' Ted down the bar last friday."), our problem domain becomes more parallel. This sort of behavior will require a wide variety of very intense but computationally different and largely independent tasks. In that scenario, more processors -> more processing per unit time (possibly human reaction time for a hand-held, or maybe a few weeks for scientific applications), rather than more processors -> same processing, less time.
Aren't computers cool?