Optimisers not optimistic about Merced

That old load latency has it in its spell...

A communication from a reader has pointed out that one big problem with Intel's up and coming Merced processor is optimising the compilers. The reader, who preferred not to be named, said that optimising simple programs with small data sets is easy, but for real world applications, including server programs, it is almost impossible. That is because the variation according to the type of data used is too extreme. "Basically, the only way to optimise for a particular data set is to dynamically recompile explicitly for that data set," he said. "In a way, this is what the Alpha EV6 does, but the IA-64 was designed explicitly to avoid this type of problem." He explained the problem is down to load latency. For Merced, instructions are grouped together into a block which can all be executed in parallel. "You cannot dynamically re-order the execution of these blocks and they must finish in the same order they started." If one block stalls, the following block will stall too. UltraSparc and Alpha processors, he said, do not do this unless there is a data dependency. "Even StrongARM, which is single issue, can execute instructions following a load if there is no data dependency." The foregoing means that you are at the mercy of data and memory access speed, load latency. If the data set fits into the cache, it will probably work. But the speed difference between CPU and memory is getting bigger and bigger, and that makes thing worse. He claimed that could be one reason Intel is so intent on pushing Rambus as a standard as it has a faster clock rate, even though it does not work well in servers. ®

Sponsored: Designing and building an open ITOA architecture