What Carthage tells us about Amazon, Fukushima and the cloud
Rubbing salt in the wounds
A little redundancy never hurt anyone
True, that's what a lot of people already thought they were buying with this cloud stuff: vast numbers of servers in different locations, different power supplies, multiple points of access on and off t'internet and all those good things. And indeed, they were getting those things. But they were getting them with the bottleneck of one access point on and off that system: that bottleneck being Amazon's management of all that multitude of kit. And if you accept that Murphy's Law also applies to economics (which I most certainly would), then you would have made the prediction that as that was the only place where the grand plan could go wrong, that's where it would go wrong: as it indeed it did in that case.
Which leads us to the lesson that while we might want to have redundancy at the level of components in the system, at the memory or processor chip level, at the server, at the router or route level, so we might also want to have it at the entire system level. Multiple suppliers of cloud computing to provide for the possibility that any one of them might fall over flat perhaps?
It would be nice to think that this has just opened up another level of integration for some bright sparks: advising people on how to manage a cloud of clouds, but of course that management itself would then become the one bottleneck that could, and therefore would, go wrong. Or are we getting to silly levels of recursion now?
Let's move the analogy to something even older than economics, to agriculture. Over the millennia that have passed since we invented the idea, we've moved our system resilience. Where we once had multiple crops on the same farm – betting that not all will be wiped out by passing troubles – humans later increased food fragility at the local level through monocultures, and then further on in history, increased our famine resilience with geographic dispersion. The principle does seem to work well enough for food: the risks of someone starving when they're plugged into the global system are the lowest they've ever been.
Cloud computing needs to find the correct balance of risks, and will find this through the usual market-based experimentation. Will multiple components make systems sufficiently resilient? Or will multiple systems be necessary to provide that desired level of redundancy? ®