The New Parallel Paradigm
Another mind shift that the IT industry is going to have to undergo is a change in the way we think about programming. Patterson and his team believe that there should be two layers of software in this new parallel paradigm, one he called the efficiency layer and the other he called the productivity layer. The efficiency layer would be comprised of about 10 per cent of programmers, the experts at creating frameworks and libraries, the people who can get down close to the metal and wring out efficiencies in code. The remaining 90 per cent of programmers would work in the productivity layer, and they would be domain experts in particular fields or industries, who take the frameworks and libraries and turn them into applications.
Now here's the neat bit. To help make parallel programming easier, Par Lab's experts want to take advantage of parallelism itself and create "auto-tuners" that run lots of different optimizations on code as it is compiled and heuristically search for the best version of the compiled code to run on a particular piece of hardware. Patterson said that in early tests, an auto-tuner capable of machine learning was about 4,000 times faster than an expert at tuning the code - and tuning for parallel architectures is the big problem with those architectures.
There are a lot more challenges that the industry faces in coping with parallelism, and one of them might just be an explosion of custom-made processors, FPGAs, and other computing elements that get woven together into future systems that do not look like the relatively simple devices we called personal computers or servers a few years ago.
Patterson is also espousing that processors and the other elements of systems have standardized methods of gathering information on power and performance to feed back into the programming tools, so efficiency programmers can figure out why the system isn't using all of its available memory bandwidth or productivity programmers can do what-if analysis on what happens to thermals or performance in the system if they change their code.
"There was a decade or so where we were polishing a pretty round stone," Patterson explained. "Going forward, the field is really wide open, but research really has to deliver on this. The IT industry is really going to have to deliver on doubling the core count every year and on getting value out of that."
Either that or the software business collapses and a whole lot of IT jobs go out the window as the industry shifts from a growth market, where we all have software driving us to upgrade to faster (well, more capacious) systems to a replacement one where we just get a new one when the old one breaks. ®
Really enjoying this stuff.
A few things I would like to add to the mix, in no particular order:
1) IMHO, one of the biggest problems is the skill sets of the 'new generation' of programmers. I had a graduate who apparently was a Java guru. Yet he had no concept of ASCII. He did not understand *how* toLower (or LCase in VB) *actually* worked. To him, it was just 'magic black box' stuff.
2) Given the above, if we gave that graduate, say, a 40 core Intellasys processor (which are available now, off the shelf, yes, *FORTY* cores), what would he do with it?
3) All of the above does not mean that this graduate is thick/stupid/whatever. Actually, he was really bright, and has gone on to do well. However, the standard of his degree course at university was appalling. Until we can get back to 'brass tacks' in the educational side of things, we are not going to produce people with the *knowledge* (note: not talent; you are born with talent) to take the latest multi-core processors and do something truly radical, and ground breaking with them.
4) One day, I got two graduates together. I put the following to them:
"We need to build a computer system that can control a radio telescope. A big huge fucking radio telescope. Not only will it control the movement of the dish in real time in order to track moving objects in the sky, it must also gather the data received from the telescope and store it so that it can be reviewed in real time, online, by multiple users at the same time. Furthermore, the data should be stored historically and available for instant recall so that comparisons can be made with older data. All this, while ensuring that the telescope is moved efficiently, without burning out the motors in the drive gear. What do you suggest?"
They came up with credible solutions, none of which were wrong particularly, and were a reflection on modern programming/system analysis thought trends...
"Well, we'll use a few computers... One for an SQL database, one for tracking the telescope, and one for viewing data."
"Ok, great. But that's an awful lot of processing power. How will they communicate with each other?"
"Using XML over a LAN."
"Yes that will work. But if you use XML, you will need an XML parser, and code to package your data into XML packets - some sort of object model..."
"Yes, we will abstract each item of data into objects, these can remoted over the LAN using SOAP."
"Ok, its sounding pretty cool. XML is really only useful though when you need to share your data with third parties, where it needs to travel through firewalls, and be parsable by another machine that may not necessarily be running the same platform as you. We're talking about a system that is self contained, connected via a switch. Couldn't we just use sockets and our own protocol? Wouldn't that be much more efficient?"
"Well yeah, but, that would be difficult..."
Then I leave them goggle eyed when I say, "actually guys, I'm pulling your chain. This problem has already been solved. In 1971. By Chuck Moore. He did the whole thing on one PDP-11 with a single disk drive and 32K of RAM."
Sometimes, I really do think we've gone backwards.
Mines the one with the "Threaded Interpretive Languages" (1981) book in it. Sometimes we should go back and read the old stuff, lest it not be forgotten. It might teach us something.
Back to the mid-80s
So, let's get this straight - Patterson, who kick-started RISC architectures in the early 80s is talking up new paradigms of parallel processing, a hot topic from the mid-80s.
Thing is, we solved it in the mid-80s, with the INMOS Transputer. INMOS was therefore sold off by the Tories as soon as possible. The Transputer was pure genius since it was able to easily map programs that ran internally on a simulated multi-processor to an actual multi-processor environment: so the language encouraged parallel programming and it scaled from 1 to 1000s of devices.
Let's do a bit of Math. The early Transputers ran at 20MHz (giving 20 simple MIPS of performance) and probably had about 100K transistors in them each with at least 4K of on-chip RAM (+off chip too). In 1989 I ran my dissertation project on a 9-transputer rack giving me: 20*9=180MIPS of performance.
Let's scale that by 2 decades. Instead of 20MHz we have 3GHz ( x 150) and instead of 100K transistors we have 2 billion transistors (x200). That's equivalent to 20*150*200 = an astonishing 600,000 MIPS of performance / Transputer (with an internal memory equivalent to 800K). My equivalent transputer rack would have 4.9TIPS of power!
Instead we decided to base the future of computing on the (literally) back-of-an-envelope design which has set us back 20 years. I'll grab my coat.
-cheers from julz @P
Niagara nonsense @Matt Bryant
I've only just read this falsehood from Matt Bryant:
"This failure to ramp up the infrastructure is perfectly demonstarted by Sun's Niagara chips, where they have effectively given up on the idea of keeping a core spinning and instead settled for having lot of cores idle and waiting whilst a few work"
This is precisely the opposite of the truth; the Niagara chips use many thread contexts to keep the cores busy while some threads are waiting for memory. For applications like webservers, the impact is dramatic, e.g., Zeus:
To do this, they have more memory bandwidth (including a crossbar on chip) than typical CPUs because they effectively transform a latency problem (individual threads waiting for memory access) into one of bandwidth (lots of threads accessing memory while some are executing).
The result is that individual thread performance isn't great, but for workloads comprising many threads or processes the throughput is much greater than anything else around right now, simply because so little hardware is idle.