The distinction between driving a display and general-purpose programming is blurring. As game visuals become more advanced, more of the code is devoted to simulating real-world physics. "The combination of simulation and visualisation is going to transform how people enjoy games," Huang says.
In the same way, designers and engineers with workstations can use GPU accelerators to render accurate simulations of their designs. NVIDIA Maximus uses two GPUs, one from its Tesla line for general purpose programming and the other a Quadro for the display. "Now the workstation is completely changed because it can combine the workflow of two parts of the design, the design part, and the simulation part," claims Huang.
Huang is looking forward to Windows on ARM. He talks about the Asus Transformer tablet and its long battery life, and then says: "Imagine Windows on ARM on that device, and next-generation versions of that device. It's a foregone conclusion that the PC industry will be revolutionised. I'm anxious to see Windows on ARM come to market and I think Microsoft is going to be very successful with it."
There are a few clouds on NVIDIA's horizon. One is that ARM, which dominates the world of mobile CPUs, is now also designing mobile GPUs, under the brand Mali. That could undermine NVIDIA's Tegra business, a SoC (System on a Chip) which combines an ARM CPU with an NVIDIA GPU. Huang does his best to dismiss Mali as having only "basic capabilities". He adds, "We have to continue to find our value-add, if we don't then we don't have a role in the world."
Huang will not be drawn on the subject of Kepler, his company's next generation GPU family, which seems to be delayed though only in a notional sense since no date has been announced.
The Intel issue
There is also Intel to think about. Intel's multi-core evangelist James Reinders says its forthcoming "Knights Corner" MIC (Many Integrated Core) processor will solve the efficiency issues Huang describes. "Knights Corner is superior to any general-purpose GPU type solution for two reasons," Reinders tells us.
"We don't have the extra power-sucking silicon wasted on graphics functionality when all we want to do is compute in a power efficient manner, and - second - we can dedicate our design to being highly programmable because we aren't a GPU - we're an x86 core, a Pentium-like core for "in order" power efficiency - every algorithm that can run on GPGPUs will certainly be able to run on a MIC co-processor.
"MIC used to be a GPU," says Huang when asked about Intel's co-processor. "MIC is Larrabee 3, and Larrabee 1 was a GPU. So there is no difference, except of course that we care very much about GPU computing, and we believe this is going to be the way that high performance computing is performed."
NVIDIA's other advantage? CUDA is available now. ®
There are a few big problems with asynchronous computing. One of then is instruction coordination and dependency. A lot of computing tasks in a CPU are interdependent so have to wait on the state of different units. In synchronous units, timings are better known so processors can be tuned so that dependent results arrive at predictable times, reducing the need for gatekeeping.
The second and more fundamental issue goes to uncertainty. Without the clock ruling the processor, there must still exist some form of coordination between the parts of a processor to determine who gets what first and so on. Otherwise, you can end up in metastable states which can produce dangerous uncertainties.
Until we can get processors with freely scaling speed, it would be difficult indeed to combine them - especially in terms of power. It's not worth it, otherwise.
Though I could see it working - turn off all the low speed/high parallel processors, turn on the high speed/low parallel, process the serial command quickly, then flip it back the other way - it wouldn't be worth the effort to put it on one chip, rather than two distinct chips.
Dual CPU GPU
@ArmanX tricky to do with big complex x86 type processors. One problem is that they are synchronous - all the billions of transistors clock together which uses a huge amount of power. GPUs typically run at a much lower clock speed.
Designing CPUs where different parts run at different speeds, or free run has been a series of failures for at least the last 25years. One possible approach is to add a small linear CPU to the GPU to do the boring housekeeping stuff - but probably an ARM rather than x86