Beowulf Gods — rip into cloud's coding entrails
Slay distributed dragons with old-school skills
Distributed computing is no longer something that only occurs in universities or the basements of the really frightening nerds with Beowulf clusters made of yesteryear's recycled horrors.
Distributed computing is sneaking back into our data centres on a number of fronts and it looks like it's probably here to stay. The thing is, those of us hardened in the ways of Beowulf are likely to have an edge when it comes to wrangling today’s abstracted but supposedly easier-to-use approaches.
Before I get into that, it’s worth making sure we’re all on the same page. Distributed computing is one of those terms that has evolved over the years and is used differently by different people.
As I view it, distributed computing is when a group of individual computers (nodes) work together towards the same goal (usually the provisioning of a given service), but do not have a shared memory space. That is, each node communicates with other nodes by passing messages in some form or another, instead of directly addressing the RAM of another node.
Computers that have a shared memory space (where each node can directly address the RAM of another node) are better talked about as parallel computing.
One to many or many to one
According to some definitions, the client-server model should be considered as distributed computing. Something (usually a database) runs at the server while something else (usually a thick application containing a user interface as well as all the business logic code) runs on the client.
The client and the server act in concert to deliver the information to the user.
While I understand why some would classify this as distributed computing, it is not what I mean when I talk about the modern resurgence of distributed computing. At its most basic, the client server model uses one relatively powerful server to service the requests of multiple clients.
To understand what I am talking about when I talk about distributed computing think about storage technologies and companies such as Gluster, Hadoop, Caringo, Coho Data, Exablox, and the elventy squillion others that are so similar they've collectively become indistinguishable.
This is not multiple clients talking to one server, it's multiple clients talking to huge clusters of servers, because we just can't build a single box big enough anymore.
Virtualization clusters can be thought of in the same way: a bunch of boxes are lashed together to present a single server – a hypervisor on which to run VMs – in a transparent fashion. Distributed databases are a thing, we've been distributing web servers forever and content delivery networks can be bought as a service.