Why Make the Bet?
So why make the bet? We don't have much of a choice. No one can build a faster computer (meaning higher clock speeds) to boost performance. And while there are shared computing infrastructures that allow many people to share the performance inherent in multicore processors (even if their slices don't run applications particularly faster), that is not really solving it. (It may be all we get, though.)
Perhaps more importantly, said Patterson, everyone has been forced into trying to crack the parallelism problem, and the courage people get from this hard, cold fact will spur innovation.
Rather then take on parallelism at the system level, the Par Lab - established two years ago - is focusing on parallelism inside individual processors and system-on-a-chip designs. This is where the laptop question above will be answered, after all. And the project has a goal of designing the programming methods that produce efficient and portable software that can run on 100 or more cores and can scale as the core counts in single-chip machines double ever two years.
Solving this particular variant of the parallelism problem is going to take some mind shifting across the IT industry. First, we must stop asking the question about who will need a 100-core processor to run Microsoft Word. Patterson said that while he is a Word user - and he likes Word - the question irks him.
"Questions like that make me think we have failed as educators," Patterson said. The real issue has nothing to do with supporting legacy applications. "I am pretty sure that the best software has not been written yet," Patterson said. And he gave a few examples of neat projects that could eat up a lot of parallel processing capacity in a single system.
The first was a loudspeaker array using 120 tweeters to create a 3D sound system, which is a prototype that is actually running at Berkeley's Center for New Music and Audio Technology. A similar use of the signal processing technology created through software on a parallel chip could be used as a hearing augmenter for laptops and handheld devices. Take it a step further and put some facial recognition software on it, creating something Patterson called a "name whisperer." This device would tell you who is coming up to you to talk and why you might care based on an archive of conversations you have had.
Next page: The New Parallel Paradigm
Really enjoying this stuff.
A few things I would like to add to the mix, in no particular order:
1) IMHO, one of the biggest problems is the skill sets of the 'new generation' of programmers. I had a graduate who apparently was a Java guru. Yet he had no concept of ASCII. He did not understand *how* toLower (or LCase in VB) *actually* worked. To him, it was just 'magic black box' stuff.
2) Given the above, if we gave that graduate, say, a 40 core Intellasys processor (which are available now, off the shelf, yes, *FORTY* cores), what would he do with it?
3) All of the above does not mean that this graduate is thick/stupid/whatever. Actually, he was really bright, and has gone on to do well. However, the standard of his degree course at university was appalling. Until we can get back to 'brass tacks' in the educational side of things, we are not going to produce people with the *knowledge* (note: not talent; you are born with talent) to take the latest multi-core processors and do something truly radical, and ground breaking with them.
4) One day, I got two graduates together. I put the following to them:
"We need to build a computer system that can control a radio telescope. A big huge fucking radio telescope. Not only will it control the movement of the dish in real time in order to track moving objects in the sky, it must also gather the data received from the telescope and store it so that it can be reviewed in real time, online, by multiple users at the same time. Furthermore, the data should be stored historically and available for instant recall so that comparisons can be made with older data. All this, while ensuring that the telescope is moved efficiently, without burning out the motors in the drive gear. What do you suggest?"
They came up with credible solutions, none of which were wrong particularly, and were a reflection on modern programming/system analysis thought trends...
"Well, we'll use a few computers... One for an SQL database, one for tracking the telescope, and one for viewing data."
"Ok, great. But that's an awful lot of processing power. How will they communicate with each other?"
"Using XML over a LAN."
"Yes that will work. But if you use XML, you will need an XML parser, and code to package your data into XML packets - some sort of object model..."
"Yes, we will abstract each item of data into objects, these can remoted over the LAN using SOAP."
"Ok, its sounding pretty cool. XML is really only useful though when you need to share your data with third parties, where it needs to travel through firewalls, and be parsable by another machine that may not necessarily be running the same platform as you. We're talking about a system that is self contained, connected via a switch. Couldn't we just use sockets and our own protocol? Wouldn't that be much more efficient?"
"Well yeah, but, that would be difficult..."
Then I leave them goggle eyed when I say, "actually guys, I'm pulling your chain. This problem has already been solved. In 1971. By Chuck Moore. He did the whole thing on one PDP-11 with a single disk drive and 32K of RAM."
Sometimes, I really do think we've gone backwards.
Mines the one with the "Threaded Interpretive Languages" (1981) book in it. Sometimes we should go back and read the old stuff, lest it not be forgotten. It might teach us something.
Back to the mid-80s
So, let's get this straight - Patterson, who kick-started RISC architectures in the early 80s is talking up new paradigms of parallel processing, a hot topic from the mid-80s.
Thing is, we solved it in the mid-80s, with the INMOS Transputer. INMOS was therefore sold off by the Tories as soon as possible. The Transputer was pure genius since it was able to easily map programs that ran internally on a simulated multi-processor to an actual multi-processor environment: so the language encouraged parallel programming and it scaled from 1 to 1000s of devices.
Let's do a bit of Math. The early Transputers ran at 20MHz (giving 20 simple MIPS of performance) and probably had about 100K transistors in them each with at least 4K of on-chip RAM (+off chip too). In 1989 I ran my dissertation project on a 9-transputer rack giving me: 20*9=180MIPS of performance.
Let's scale that by 2 decades. Instead of 20MHz we have 3GHz ( x 150) and instead of 100K transistors we have 2 billion transistors (x200). That's equivalent to 20*150*200 = an astonishing 600,000 MIPS of performance / Transputer (with an internal memory equivalent to 800K). My equivalent transputer rack would have 4.9TIPS of power!
Instead we decided to base the future of computing on the (literally) back-of-an-envelope design which has set us back 20 years. I'll grab my coat.
-cheers from julz @P
Niagara nonsense @Matt Bryant
I've only just read this falsehood from Matt Bryant:
"This failure to ramp up the infrastructure is perfectly demonstarted by Sun's Niagara chips, where they have effectively given up on the idea of keeping a core spinning and instead settled for having lot of cores idle and waiting whilst a few work"
This is precisely the opposite of the truth; the Niagara chips use many thread contexts to keep the cores busy while some threads are waiting for memory. For applications like webservers, the impact is dramatic, e.g., Zeus:
To do this, they have more memory bandwidth (including a crossbar on chip) than typical CPUs because they effectively transform a latency problem (individual threads waiting for memory access) into one of bandwidth (lots of threads accessing memory while some are executing).
The result is that individual thread performance isn't great, but for workloads comprising many threads or processes the throughput is much greater than anything else around right now, simply because so little hardware is idle.