Exploding core counts: Heading for the buffers
Ferrari engine meet go-kart
The top minds at IT analyst Gartner have been mulling over the ever-increasing number of cores on modern processors, and have come to a conclusion that many academic experts have already come to - boosting core counts to take advantage of Moore's Law is gonna run out of gas, and sooner rather than later, because software can't take advantage of the threads that chip makers can deliver. At least not running the current crop of systems and application software.
By Gartner's reckoning, the doubling of core or thread counts every two years or so - and for some architectures, the thread growth is even larger with each jump - will affect all layers of the software stack. Operating systems, middleware, virtualization hypervisors, and applications riding atop this code all have their own limitations when it comes to exploiting threads and cores, and Gartner is warning IT shops to look carefully at their software before knee-jerking a purchase order to get the latest server as a means of boosting performance for their applications.
"Looking at the specifications for these software products, it is clear that many will be challenged to support the hardware configurations possible today and those that will be accelerating in the future," explains Carl Claunch, vice president and distinguished analyst at Gartner. "The impact is akin to putting a Ferrari engine in a go-cart; the power may be there, but design mismatches severely limit the ability to exploit it."
This core and thread madness is exactly the problem that David Patterson, who heads up the Parallel Computing Laboratory University of California at Berkeley (funded largely by Intel and Microsoft), went through in great detail in his keynote address at the SC08 supercomputing show last November. More recently, researchers at Sandia National Laboratory released a paper showing that chips run out of gas at eight cores.
IBM is expected to get to eight cores with Power7 next year, and Intel will get to eight cores with "Nehalem" Xeons this year. Sun Microsystems already has eight cores per chip (with eight threads per core) with its "Niagara" family of Sparc T series, and will boost that to 16 cores and 32 threads with its "Rock" UltraSparc-RK processors, due later this year.
Advanced Micro Devices has quad-core "Shanghai" Opterons in the field, and is only boosting this to six-core chips later this year with "Istanbul" Opterons and will put two six-core chips into a single package with "Magny-Cours" Opterons. (AMD does not do multiple threads per core, and remains the only major server chip maker that does not do so.)
Itanium is a bit of a laggard when it comes to core counts with "Tukwila," also due this year, having only four cores per die and two threads per core. But that could turn out to be good news for Intel.
Chip makers certainly didn't want to add this many cores to their processors, and roadmaps from only a few years ago showed process speeds rising up to 10 GHz and beyond, assuming that the Moore's Law shrinking of transistors would simply allow processor speeds to crank up more or less in proportion to the shrinkage in the circuits. Depending on the architecture, chips have hit thermal walls at between 2.5 GHz and 5 GHz simply because to push clocks even a little bit higher creates an unacceptably large amount of heat.
Over at Gartner, Claunch says that there are hard and soft limits on how software can use threads and cores that will limit the usefulness of boosting the core and thread counts in systems. He says that most virtualization hypervisors can't span 64 cores, and forget about the 1,024-core machines that could be put into the field in four years from now if the core counts keep a-rising.
Here's a hard limit: Claunch says that some operating systems have an eight-bit field that tells the operating system how many processors (real or virtual) it can hold, and that means 256 cores or threads is a maximum. (This can be changed, of course, but it has to be changed and it most likely will not be done as a patch to existing operating systems running in the data centers of the world.)
"There is little doubt that multicore microprocessor architectures are doubling the number of processors per server, which in theory opens up tremendous new processing power," says Claunch. "However, while hard limits are readily apparent, soft limits on the number of processors that server software can handle are learned only through trial and error, creating challenges for IT leaders. The net result will be hurried migrations to new operating systems in a race to help the software keep up with the processing power available on tomorrow's servers."
Gartner's analysis does, of course, leave out one important issue. The main bottleneck on system performance is arguably - and man, do people argue about this - the limits on main memory capacity and bandwidth inside systems. In many cases, customers upgrade server platforms not because they need more CPU cores, but because they want both more memory and more bandwidth into and out of the CPUs.
Moreover, for some workloads - this is particularly true of online transaction processing - the amount of work a machine can do is more affected by the number of disk drive arms and the bandwidth in the disk subsystems than other factors, like the number of processor cores. In benchmark tests, server makers can get their server processors running at 95 per cent or higher utilization, but it is a very well run big iron box running Unix that can consistently stay at even a 60 to 70 per cent utilization rate running OLTP workloads.
I/O and memory bandwidth issues keep the processors tapping their feet, waiting for data. IBM's mainframe operating systems and middleware, as well as end user applications have been tuned and tweaked over decades to wring every ounce of performance out of the box and run at 90 per cent or higher utilization rates in production environments, but if you paid five or ten times the amount it costs to buy an RISC or x64 server, you would spend a lot of dough on tuning, too. And having done all that work, you would sure as hell think twice before moving those applications off the mainframe. Which is why mainframes persist.
The biggest issue, it seems, is that memory speeds have not even come close to keeping pace with processor speeds, which has been mitigated to a certain extent by the thermal wall that processors have hit. This is giving memory speeds a chance to catch up, perhaps. But the fastest DDR3 memory on the market still tops out at 1.3 GHz, and that is still less than half the speed of, say, a Nehalem Xeon processor that will hit the streets later this quarter. And even if you could get the speeds of CPU cores and memory in line, that doesn't solve the capacity issue.
Memory DIMMs can only be so small at a certain price per capacity, and motherboard makers can only put so many wires on the board for memory at a price. The memory issue is not going away. But solving this will perhaps be easier than coping with software stacks that don't understand how to make use of so many threads. ®