This article is more than 1 year old

Beowulf Gods — rip into cloud's coding entrails

Slay distributed dragons with old-school skills

Thought patterns

Working with distributed computing requires a different thought process than the old client-server days. Your Windows NT era financials app and its centralised database is nothing like even a simple web setup using a load balancer.

Just the introduction of that load balancer means thinking about networking in a different way, and that's before we get into the complexity of n-tier architectures with clusters of servers handling services at each tier.

Some of those services may be front ended by a load balancer (which may or may not be clustered), some of those services may be multicast affairs and some may use something as primitive as low TTL DNS-based failover.

Back in the day, if you wanted to do this sort of distributed computing you had learn all the niggly little bits that made it go. Beowulf clusters were (and are) a right pain to set up, and a clustered MySQL, Apache, load balancer configuration was at least a couple of day's research, if not more.

Today, you can go get Caringo as an off-the-shelf software product, set the thing up in a VM, configure a rack or two full of nodes for PXE boot and have a fault-tolerant distributed object storage system up in about half an hour.

If you have a credit card then CloudFlare can add a complete content delivery network to your website in 10 minutes. Distributed computing has moved from a rite of passage to colour-by-numbers.

It's a good thing that this is getting easier if only because you don't have to be a very big company today to have needs that move beyond what a single node can deliver. After 15 years of having "server consolidation" pounded into us by virtualisation teams that might seem a little counterintuitive. For everyone except the smallest companies, however, I'm willing to bet you can't fit all your storage in a single node.

The downside to this revolution in distributed computing is that the fundamentals aren't being taught. Building your Beowulf cluster wasn’t just about lashing together a bunch of computers, but learning Amdahl's and Gustafson's laws.

Learning distributed computing the hard way taught us when we got more oomph from a brand-new, faster node versus adding more of the same to the cluster. We learned that a Beowulf cluster of cell phones is never going to deliver better performance per watt than a great big stonking Xeon. If for no other reason than that there's pitifully little code out there which can use a bunch of weedy little processors efficiently enough to outperform the 'leccy-guzzling Xeon.

This lack of basic knowledge leaves many of today's distributed computing customers vulnerable. Yes, the ability to get a distributed something-or-other off the shelf can relieve a bottleneck in their infrastructure.

Unfortunately, distributed computing only scales so far before adding nodes just doesn't help you anymore.

At some point you have to go back to the code – either your own internal teams or to the vendor – and get your efficiencies there. This can be through increased parallelization (so that you can make larger clusters viable), by breaking the work up into even more tiers (so you can use multiple clusters to solve different problems), or by making load balancers a possibility (so that you can use multiple clusters to solve the same problem).

Knowing when to do this, where to look, and when the performance improvements promised by the off-the-shelf distributed computing salesdroid are even possible requires knowledge and experience. So dust off your old gear and build a Beowulf cluster or two.

With distributed computing back and vendors selling dreams of clustering in a box or using software glue, the job you save may be your own. ®

More about

TIP US OFF

Send us news


Other stories you might like