Geeks pray $100,000 box will solve software crisis
The 1,000-chip 'MIT Fix'
Hot Chips There's a weird 1,000-processor computer floating about that's being hailed as the "MIT Fix."
Over the years, MIT's computer science department has built up a reputation for making all too bold performance claims around their systems and software research. Rival institutions and companies, hampered by time and financial constraints, can't always craft a cutting edge replica of MIT's gear to certify just how accurate the school's claims are.
On a much larger level, a similar problem is haunting the entire computing industry.
We're heading at speed from two, four and eight-core processors to having tens and even hundreds of cores per chip. Such products will require new software programming models and new system designs. Few researchers, however, can afford to play around with a 1,000-core system – that is if such a system was readily available already.
In order to test out cutting edge code and hardware, a group of computing aficionados – both academics and free range researchers - has teamed up to create the RAMP (Research Accelerator for Multiple Processors) system. This "MIT Fix" isn't a supercomputer, as one might be inclined to think. Rather it's a relatively cheap 1,000 node machine made out of FPGAs (Field Programmable Gate Arrays) that can serve as a practical test system for futuristic system designs.
"Little is known on how to build, program, or manage systems of 64 to 1024 processors, and the computer architecture community lacks the basic infrastructure tools required to carry out this research," the RAMP group writes on its web site. "Fortunately, Moore's law has not only enabled these dense multi-core chips, it has also enabled extremely dense FPGAs.
"Today, one to two dozen cores can be programmed into a single FPGA. With multiple FPGAs on a board and multiple boards in a system, large complex architectures can be explored."
The RAMP project has garnered special attention due to one of its leads - computing legend David Patterson. The Berkeley researcher pushed one of the first RISC designs, which turned into the SPARC processor architecture used by Sun Microsystems and Fujitsu. Later, he led the RAID storage project and then teamed with then Stanford professor – and now Prez – John Hennessy to write a seminal computer science text. Patterson, speaking yesterday at the Hot Chips conference here, sees a growing disconnect coming between hardware and software designers. The move to multi-core chips, which is already well underway, will demand more complex, multi-threaded applications.
"What's wrong with the multi-core change is that no one is ready for it," he said. "The pieces of the software stack are not ready for thousands of CPUs per chip."
Software designers tend to be reluctant to begin writing complex code before plenty of hardware arrives to handle their applications. Such a strategy won't work out well in the context of the multi-core shift, according to Patterson. Those used to seeing performance increases in their code via GHz hikes will suffer from under-performing code that struggles to make its way across numerous, low-power chips. And we're talking about a problem that affects algorithms, programming languages, compilers, operating systems and libraries.
So far, researchers willing to tackle these software problems have suffered from limited, practical hardware choices. The lucky few – very few – can shell out $50m for Unix-based SMP systems from the likes of SGI or Sun. For about $3m, an organization can build a test cluster using x86 servers and Linux, but such systems are often hard to manage and eat up space and power. Meanwhile, desktop simulators are cheap but not really capable of returning accurate results when you're talking about mimicking a 1,000-core machine.
Patterson, and researchers from Intel, Stanford, the University of Texas, Carnegie Mellon, the University of Washington, Berkeley and even MIT, think the RAMP system offers a nice middle ground.
A 1,000-chip replica will cost between $100,000 and $200,000, provide better performance than a desktop and replicate the conditions of a true multi-core machine well. In addition, the researchers can reprogram the systems to handle different CPU architectures such as Power and UltraSPARC and different operating systems such as Linux and Solaris. (So far, the x86 crowd has declined to participate in the project despite the presence of Intel engineer Shih-Lien Lu.)
Critics charge that at 200MHz the FPGAs will run too slow to give accurate results, especially with speedier memory components surrounding them. Patterson, however, stressed that the RAMP crew will focus on "clock cycle accounting." They will tweak different components such as bandwidth, cache size and storage and then give researchers or companies an idea of how many clock cycles it takes to complete a given operation. This should provide customers with a picture of "how their application will run on a computer of the future."
"It has to provide faithful and credible results," Patterson said.
By next month, the RAMP group hopes to settle on its first processor architecture of attack and looks to be leaning toward Power and Linux. Then, over the next year, the team wants to get some basics such as accurate clock cycle accounting down and then more complex functions such as transactional memory all running on up to 256 processors.
Patterson hopes that large vendors and even start-ups will embrace the open RAMP work and get test systems out to universities and software designers.
Hopefully, someone will send MIT a system too.
Berkeley's Patterson gave rival MIT some playful ribbing during his Hot Chips speech, complaining that the school often produces hard to replicate results. A Sun engineer at the show backed up the joke, saying the company is hesitant to trust performance claims from MIT's computer science department. A RAMP box could make it easier to check MIT's performance claims. ®