Hey, software snobs: Hardware love can set your code free
The two go hand in hand when scaling data-crunching systems
Comment In computing there are many, many different ways to run down other people’s work, not the least of which is: “OK, so they removed the bottleneck, but only by throwing faster hardware at it.”
The implication is that tackling an issue just with software is intrinsically better. When did you ever hear anyone say: “OK, so they removed the bottleneck, but only by better coding?"
The truth is that solving computing problems always involves both hardware and software; the trick is not to look for specific kinds of solution, but to find the most effective one. Of course "effective" in analytical terms can be defined in a host of different ways – cost, speed of implementation, reliability and so on.
And to make matters more complex the most effective solution to a given problem will vary with time. For example when Relational Online Analytical Processing (ROLAP) was all the rage, many star schemas were usually woefully poorly indexed. So the software fix of applying sensible indices was very effective – and much better than merely throwing hardware at the problem. These days, a more effective solution might be to chuck in a solid-state drive (SSD).
There can be no doubt that SSD technology will speed up the I/O of a system, many times over in some cases, and can be spectacularly better than spinning disk in terms of random reads and writes. And as SSDs continue to plummet in price and get faster, we expect to see them used more and more to replace delicate software hand-tuning such as indices, calculated redundant columns, horizontally and/or vertically splitting of tables and so on.
And, to switch to another valid measure of efficiency, we would argue that SSDs generally require less maintenance than software and fewer design tweaks like these and are therefore better.
This is not to suggest you should abandon software/design changes and use SSDs in all analytical systems, but it does illustrate the point that “throwing hardware” at a problem can be the correct solution.
Quick wins for accelerating the performance of software can be found by increasing the speed and capacity of the system memory. True, it’s volatile, so if the plug is pulled you lose all that lovely data, but analytical systems work almost exclusively on copies of the data. Memory capacity continues to rise as prices continue to drop and systems with many gigabytes of main memory are now common.
On the other hand, the volume of data continues to grow exponentially and we know that a great deal of data is accessed only rarely, so memory, while very useful, is never going to be the only answer.
For years we have got away with relying on CPUs acquiring ever-increasing numbers of transistors (just ask Gordon Moore) and becoming faster and faster. Recently however we have hit a hard limit in the speed of CPU cores and are resorting to using more of them to allow parallel processing. Resourceful types are finding good use cases for GPUs, chips architected originally for graphics processing, as these are designed to perform mathematical operations at high speed across many cores.
Nowadays, the farmer first glues on some horns and sells it to the abbatoir as a cow.
There is a reason for Software Smugness
You haven't heard of the reverse boast "only by throwing software at it" because of a very simple fact: If I can get more performance out of the same hardware, by designing an O(N) algorithm to replace an O(N^2) I am being smart. Throwing more hardware at a problem when a better algorithmic solution exists is stupid.
I have seen people use weeks of wall-clock time on a 512 core segment of a big machine, simply because their code was bad. My colleague coded the thing properly in C++ and had the code running on his desktop and finishing in a few minutes (O(2^N) vs O(N log N) if I recall correctly). Only throwing hardware at a problem is often wrong. Thinking about better algorithms is never a bad idea.
Once you have really thought about the algorithmics, then you can start throwing more hardware at it (and once you do that, you must rethink the algorithmics again, especially when doing parallel stuff). So in our massive image processing stuff (Gpixel and Tpixel), we first minimize communication and disk-access overhead, and then move to SSD or Fusion-IO stuff.
"The most amazing achievement of the computer software industry is its continuing cancellation of the steady and staggering gains made by the computer hardware industry. "
— Henry Petroski
What a lot of waffle
that just boils down to:
1. Faster hardware can make stuff faster, to a point.
2. You might need to think about the algorithms you use.
Well, thanks for that wonderful insight.
As someone more or less said above, no point writing a load of code to parallelise a really inefficient algorithm and then chucking lots of hardware at it if you could replace it with a non-parallel but much more efficient algorithm.
Re: There is a reason for Software Smugness
Thinking about better algorithms is never a bad idea.
Of course it is, what is really dumb is presenting that kind of moronic statement as an axiom when in reality it is merely justification for one's own vanity. Smart people use dumb methods on occasion simply because of an appreciation of real world factors and having the common sense to employ some form of cost-benefit analysis. How many chunks of code are only ever used a small number of times, possibly even once? Think about "code" in the broadest sense of the word - it could be some one-off data manipulation job or even a simple for loop at the shell prompt.
Consider a job that we know in advance is a one-off. You use your "better" algorithms and take two hours to devise a solution that does the task in two seconds. I spend two minutes knocking something up that does the same job in another two minutes. Your solution may be "better" purely in a vanity sense but is that really the best use of resources? Remember your assertion: "Thinking about better algorithms is never a bad idea."