Texan researchers cheer tera-op chip endurance test
'We'll rock 2012'
The University of Texas plans next week to wow processor aficionados with a new chip that can chew through software at an unprecedented clip. The easily excited, however, will want to temper their enthusiasm, since the so-called TRIPS (Tera-op Reliable Intelligently adaptive Processing System) project seems to move at an un-lubed snail's pace.*
So far, UT researchers have crafted a two-core chip where each core can handle 16 out-of-order integer or floating point operations. All told, the TRIPS chip can stomach 1,024 instructions at the same time. Such a chip should speed up consumer, business and high performance computing workloads with no changes needed to current software, according to the researchers.
The UT crowd has been hammering away at the technology for seven years and once predicted a product capable of one trillion calculations per second (a tera-op) would arrive by 2010. Now, the group has pegged 2012 for its tera-op part.
Seven years ago it looked like Intel and AMD would just keep plugging away at high GHz chips that handled single software threads well, while consuming tons of energy. Over the past three years, however, all of the major chipmakers have shifted strategies to focus on multi-core designs that combine slower individual processor cores together to make overall chips able to push through multi-threaded software at a solid rate.
Intel is one company hoping to take this technology to the extreme. It's already demonstrating an 80-core processor that has reached 2 teraflops of performance. The company plans to start showing off a similar processor that uses its popular x86 instruction set next year – obviously well ahead of the UT crowd.
But the TRIPS researchers claim Intel and other mainstream chip makers may have overshot the mainstream market with their multi-core designs.
“They have made a big gamble that people writing software will figure out a way to write software that can use those processors with parallel programming,” said Steve Keckler, an associate professor at UT who has worked on TRIPS in conjunction with Doug Burger and Kathryn McKinley. “I think we will see a big wall as they try to go from 8 cores to 16 cores.”
To Keckler's point, software makers have already started to gripe about the multi-core chips from Intel and AMD. Such products require the coders to embrace multi-threaded software programming, which is quite different from what they're used to in the single thread world. Start-ups such as PeakStream and RapidMind have stepped in to solve this problem with code that allows single-threaded software to run very fast on multi-core processors, but it remains unclear if the software industry as a whole will move at pace to the new designs.
“So, while we recognize that there is a need for parallel programming, we would like to build the most powerful uniprocessor that we can,” Keckler said.
IBM, which has been helping out with the TRIPS project, reckons that the big technology breakthrough here revolves around “block-oriented execution.”
“Instead of operating on only a few computations at a time, the TRIPS processor operates on large blocks of computations mapped to an array of execution units on the chip,” IBM said in a 2003 statement. “This approach allows many more instructions to execute in parallel, thus offering higher performance.”
The prototype motherboard to be shown next week contains four 366MHz TRIPS chips, along with 8GB of memory. (The architecture can support up to 32 chips and 64GB of memory.) This test system should reach 45 gigaflops.
“The processor core is composed of multiple copies of five different types of tiles interconnected via microarchitectural networks,” UT says on its website. “Each core may be configured in a single threaded mode or in a 4-thread multithreaded mode in which instructions from multiple threads may execute simultaneously. A TRIPS processor core is fundamentally distributed for technology scalability and to provide high bandwidth to the instruction cache, data cache, and register file through partitioning and replication.”
The researchers plan to follow the motherboard release by filling a rack with 8 systems linked together - a setup that should reach 375 gigaflops.
One always needs to approach these research projects with a healthy amount of caution. Academia – even when backed by $11m in Defense Department funding and IBM – tends to move very, very slow, and the bright ideas of researchers often fail to pan out.
Less cynical types who really want to see the future now, can travel over to the TRIPS web site.
“I believe there is a strong need for very capable, high performance uniprocessor cores,” Keckler said. “Will they look exactly like TRIPS? That's a good question. I think we have a very credible case.”
In the coming years, the TRIPS group hopes to convince a commercial partner to pull the technology out of the labs.®
I enjoyed our discussion today. However, I'm not sure how you came up with this statement:
"The easily excited, however, will want to temper their enthusiasm, since the so-called TRIPS (Tera-op Reliable Intelligently adaptive Processing System) project seems to move at an un-lubed snail's pace."
Those familiar with the semiconductor industry would recognize that the industrial design cycle for a leading-edge microprocessor is 3-4 years, and that's given the fact that the instruction set architecture is already in place and that the company already has substantial experience building several previous generations. The TRIPS research cycle has been: 3 years for research concepts and early proof of concept (not part of industry design cycle), 3 years of implementation/fabrication of a never-before-built processor with a newly invented instruction set, and 6 months of silicon bringup and system implementation.
Also along the way, we developed a new compiler for the new ISA. Thus the time from research concept to prototype has actually been quite un-snail-like given what we set out to accomplish.
The early projections of a Tera-OP by 2010 made some assumptions about clock rate scaling (10GHz) and better device scaling than actually came to pass. That said, 8 slightly modified TRIPS cores could easily fit on a current generation 65nm chip, which running at 3GHz would achieve a peak performance of 768 billion instructions per second. Thus it is not unreasonable to expect the technology to support a trillion instructions per second in 2010.
Computer Architecture and Technology Lab
The University of Texas at Austin
Sponsored: Hyper-scale data management