A postcard from Intel in Lisbon
Intel says 'parallel or perish'
So you thought Intel was a hardware company? In fact, it's also a major supplier of software – compilers and developer tools.
This was what the Think Parallel Intel EMEA Software Conference 2.0 in Lisbon was all about. I've only space to cover the main theme here (there was an interesting session I must return to, on Second Life, which Intel presumably hopes will soak up the multicore processing power it's going to be providing) and the tone was set by Herb Sutter (Microsoft software architect and chair of the C++ Standards Committee), who revisited his "the free lunch is over" warning (originally given here).
Herb Sutter contemplates
the end of the free lunch.
Basically, Sutter's saying that while Moore's Law still applies in its purest sense (you can double the number of transistors on a square inch of silicon every 24 months), this isn't delivering the expected end-user performance improvements any more.
At the same time, power consumption is becoming a major issue – in fact, it's the prime barrier to growth on the very largest systems (this is confirmed by eBay Inc technical fellow Dan Pritchett, speaking at the QCon -see here - as well as by Sun making "green computing" a keynote of SunLive07).
The answer to both the flattening of the performance curve and increasing power consumption seems to be multicore chips – and the end of the ubiquitous sequential Von Neumann architecture as we know it. A small drop in clock speed yields a small drop in performance and a proportionately much larger drop in power consumption. Put several of these lower power processors on a socket and you can ramp up performance while keeping power consumption constant, thus breaking through what Intel's James Reinders (lead evangelist and director of marketing and business development), who gave the Day 1 closing keynote, called the "Power Wall".
That's all very well, of course, but for MRDA. Intel has built its business model around Moore's Law and the continual upgrading of PCs. To some extent, these powerful multicore chips might look like a solution looking for a problem. However, there certainly are some that are thought of as HPC (High Productivity, rather than High Performance, Computing) problems (real-time 100 per cent accurate speech recognition, for example) that could move to the desktop, so let's accept Intel's worldview for now.
The fly in this particular ointment is the good old programmer, who isn't famous for an ability to handle multiprogramming or multithreading. Multithreaded applications are often brittle, often due to race conditions and deadlocks, and many applications avoid these issues by only running on one processor.
However, we don't really want the sort of parallel programming on quad core chips which just runs the OS in a thread on one processor, Office on a second, and all the spy ware that clutters up most PCs on a third – leaving just one processor for actual work...
The strong message from this conference was that programmers had better start thinking in terms of parallel processing, at least if they expect to run on Intel hardware. This certainly doesn't mean coding with locks and semaphores, however. As Reinders says: "This is the assembly language of parallel processing" - and that way lies brittle, unmaintainable code.
Neither does it mean dividing your program into, say, four threads, one per processor, because while that program may work very well on a quad processor it simply won't scale to more processors. One message from this conference was to think in terms of hundreds of processors, not now perhaps, but sooner than you might think – Intel claims that it can already build 80 core chips if it wants to. Oh, and by the way, you had better make sure that your parallel-enabled application can actually run on a single core machine, or you'll find debugging the non-parallel functionality a PITA.
James Reinders says that, soon,
a programmer who doesn't
"think parallel" won't
be a programmer.
Here are Reinders' rules for multiprogramming, in my words, and as I see them:
- Think parallel first. Don't even contemplate bolting on parallel processing capabilities afterwards.
- Code to express the parallel nature of the problem. Don't write thread management code – this is the equivalent of writing in C# or Java instead of Assembler.
- Don't tie threads to particular processors. You don't want to write programs that only run properly on a particular number of cores.
- Plan to scale through increased workload. Amdahl's Law often limits the performance gain you can get from parallel processing applied to a fixed-size workload (there is usually some significant serial part of the process which can't easily be parallelised); but Gustafson observed that if you increase the workload, the serial part of the process often remains fixed and parallel processing then lets you get through the much bigger workload with similar performance.
- Only create programs which can arbitrarily add tasks to the workload, so if more processors become available, the workload can take advantage of them
- Only write programs that can run serially, mainly because (assuming that all new PCs will be multicore) they'll then be easier to debug. However, for the time being your programs will still be expected to run OK on legacy single processor machines - and remember that a program optimised for multiprocessors will usually run more slowly on a uniprocessor, so be aware of this and don't rush headlong into coding for multicore architectures.
And some of the tools which Intel thinks will help you follow these rules are:
- The OpenMP standard, which bolts efficient parallelising onto C++ and Fortran compilers using compiler hints. This is what Sutter calls "industrial strength duct tape", but it works.
- Threaded Building Blocks – C++ algorithms for scalable threading (Reinders seems very confident in this tool).
- Thread Profiler - which highlights potential performance bottlenecks.
- Thread Checker - which detects latent race conditions and potential deadlocks.
But now a note of caution. Parallel processing has always been a holy grail of computing (although Intel came to it late, perhaps). Many of the issues talked about in this conference I've met before – on multiprocessor mainframes (the most efficient way to achieve parallel processing in practice may be the mainframe job scheduler).
I've told good programmers to think about the consequences of running on multiprocessors, only to be told that "the compiler will look after it" (in general, it can't). And I've seen the results of programmers forgetting that their code can run on several processors and, in production, things may sometimes run in the wrong order as a result. This seldom shows up in test as, even if several processors are available to the test system, the chances are that you don't process enough data to see the latent race conditions, which tend to appear when the system is overloaded.
I've had to deal with the consequences of programmers deciding that they can do locks better than IBM and coding them for themselves (the application I'm thinking of was very fast – for a while, until the consequences of never releasing locks became apparent).
This stuff seems to be hard, so we're going to need very good tools and more training. And probably, much better adherence to good development process.
Do I think that parallel processing of this sort is the way of the future? Yes, emphatically, if you run on Intel or similar models it's the only way (it seems to me) to scale computer processor power effectively. Although whether we need to scale computer processor power or whether lots of specialised small computers, another kind of parallel processing, will work better, might be another question.
Reinders tried to make the point that parallelism was intuitive. His example was the queue – it's really quite intuitive that if you have a long queue, you just need more people on the desks servicing it. Simple. But this can hide a lot of complexity – if you have more desks and shorter queues checking in at Heathrow, things go faster. But you don't expect to get past check-in and find several people are assigned to one seat.
This is a trivial example, but move back a bit and airlines have gone bust because their booking systems couldn't cope with the essentially parallel activity of selling seats in an aeroplane at travel agents across the country. Planes flying three quarters full with spare capacity to cover "collisions" for seats – or upgrading overbooked passengers for travel on the next flight - can get expensive.
Do I think that parallelism is intuitive? "Only up to a point, Lord Copper". The consensus among the speakers at the conference was that this would be a revolution in thinking comparable with the OO revolution or structured programming. And (rather like OO) it will probably only become routine once the "old guard" dies off and a new generation of graduates that knows no other way of thinking takes over. ®