Dumping gear in the public cloud: It's about ease of use, stupid
Look at the numbers - co-location might work out cheaper
Sysadmin blog Public cloud computing has finally started to make sense to me now. A recent conversation with a fellow sysadmin had me rocking back and forth in a corner muttering "that's illogical".
When I emerged from my nervous breakdown I realised that capitalising on the irrationality of the decision-making process within most companies is what makes public cloud computing financially viable.
For certain niche applications, cloud computing makes perfect sense. "You can spin up your workloads and then spin them down when you don't need them" is the traditional line of tripe trotted out by the faithful.
The problem is that you can't actually do this in the real world: the overwhelming majority of companies have quite a few workloads that aren't particularly dynamic. We have these lovely legacy static workloads that sit there and make the meter tick by.
Most companies absolutely do have non-production instances that could be spun down. According to enterprise sysadmins I've spoken to, they feel that many dev and test environments could be turned off approximately 50 per cent of the time. If you consider that there are typically three non-production environments for every production environment, this legitimately could be a set of workloads that would do well in the cloud.
While that is certainly worth consideration, it only really works if it's implemented properly. Even if you can spin some workloads up and down enough to make hosting them in the public cloud cheaper than local, do you know how to automate that? If you don't – or can't – automate some or all of those workloads, are you going to remember to do spin them up as needed? What if you get sick?
For the majority of workloads proposed to be placed in the public cloud, I always seem to be able to design a cheaper local alternative fairly easily. This often applies even to the one workload for which cloud computing is arguably best suited: outsourcing your disaster recovery (DR) setup.
Colocation is still a thing
When I talk about DR with most businesses – big or small – they have a binary view of the world. They see the options as either building their own DR site, or using a public cloud provider. Somewhere in the past five years we seem to have collectively forgotten that a vast range of alternative options exist.
The first and most obvious option is simple colocation. There are any number of data centres in the world that will rent you anything from a few Us of rack space to several racks' worth for peanuts. Or, at least, "peanuts" when compared to the cost of public cloud computing or rolling your own secondary data centre.
In addition to traditional unmanaged colocation, most colocation providers will offer you up dedicated servers. Here they pay the initial capital cost of the hardware and lease it to you along with the rack space. There's also fully managed hosting available for both "you own the hardware" and "you lease the hardware" options.
In almost all cases these colocated solutions are cheaper than a public cloud provider for DR, and DR is the only bulk public cloud workload that I've been able to come close to making financial sense for businesses smaller than a 1000 seat enterprise. (Okay, dev and test under some circumstances can be worth it as well.)
So how is it that so many businesses choose the public cloud? As the debate unfolded I began to realise that the viability of the public cloud has nothing to do with the viability of the economic arguments and everything to do with politics.
How I look at the world
I need a point of reference to design a solution, so I am going with the numbers my heretofore unnamed debate partner supplied. They have about 100 VMs and 50TB of data across 8 nodes with 256 active RAM in their production environment. Let's say that I am intent on reproducing the full capacity offsite.
Racks, racks everywhere!
I can build a Supermicro 6047R-E1R36N server with 384GB of RAM, 136TB of raw storage and a pair of 8 core CPUs for less than $40k. I typically include a pair of Intel S3700 SSDs and an LSI controller for that price. This gives me lots of horsepower to play with.
Supermicro servers have the ability to do "advanced" or "mirrored" ECC memory support, allowing me to functionally RAID 1 the RAM. So this provides me a server with 192GB of usable RAM on the storage server that is absolutely rock solid. I get roughly 100TB of RAID 6 or 63TB of RAID 10 storage out of that configuration. Throw the Intel SSDs at the LSI controller and you have "hybrid" storage that caches the writes to the SSDs when more IOPS than the underlying storage is required.
If you need more compute capacity you can get it cheaply using Supermicro's Twin series servers. A 2U Twin will get you 4 nodes in 2U. A FatTwin will get you 8 nodes in 4U. Diskless, and with about 256GB of RAM I get these systems for about $5000 per node.
We need 8 nodes, so that will run us $40k. If you don't need fancy networking capabilities you can buy a Netgear 24-port 10GbE switch for about $5000, or pick up a Supermicro SSE-X24S for about $7500 if you need a few more nerd knobs. Buy two for redundancy.
If you're hyper-paranoid about your data storage and you absolutely require that your DR site be protected by more than just RAID, you can duplicate the storage server and toss on $15k for Starwind's HA SAN software.
To recap, that's 8 nodes of compute with 256GB of RAM each, running on a literally bulletproof 100TB usable RAID 6 + RAIN 1 storage setup all lashed together with 10GbE for around $150k.
It's only a little over $100k if you're cool with the storage using only RAID (instead of RAID + RAIN) for redundancy, and $40k if you just need a great big box of offsite storage and don't need the compute capacity.
None of that covers the cost of operating systems on the units and the reason for that is both simple and complex. It is simple in that disaster recover licensing is miserable whether or not you use a cloud vendor or handle it yourself.
Each vendor has a different take. Some allow you a "free pass" for the instances you keep in the disaster recovery site, so long as those instances are for disaster recovery purposes only. Some vendors insist that you have a full suite of licences for both cases.
Microsoft licences purchased with Software Assurance, for example, provide rights for "cold" backups in such a scenario. They do not, however, cover live failover licences. Microsoft has even designed the rights in such a manner that to be fully compliant in a cold failover scenario, your workloads must then be active in that DR environment for 90 days – unless, of course, you subscribe to software assurance.
The licences for the underlying infrastructure – the file servers, the hypervisor, etc – are equally complex. You can do the whole thing for free with KVM/Openstack. There's also the possibility that your particular DR software and methodology can failover some or all of the configuration and management software – and the licences with it – which may (or may not) reduce your licensing burden.
When you use the cloud for DR, all of this is a problem there too, though which licences you'll have to pay for, and which are incorporated into the fees themselves are different. What you pay in total corporate licensing also determines your volume licensing position with various vendors, which also has an impact.
Politics feeds the Beast of Redmond
Of course, for me to use Supermicro in my debate with my fellow sysadmin is blasphemy. In this particular example, the sysadmin (and his bosses) in question would only consider hardware for a colocation setup if it was provided by the "preferred vendor" contractor they already use.
You're not locked in per se, but in reality...
The contractor, naturally, will only consider DR setups if they can use SAN level replication. Additionally they will only use hardware their techs already have certifications for. This means that the contractor will only install Cisco UCS servers and EMC storage.
Given the requirements in question, the quote came back at $300k. To be honest, that seems really low for Cisco + EMC, and I don't think it would be able to replicate the entire environment, leaving me shaking my head about the whole thing. Even if we accept the quote at face value, that's double my configuration above, and I can physically shoot half the equipment and have the DR site still work.
The contractor in question is apparently also okay with Azure Hyper-V Replica. Despite earlier assertions about SAN replication being a requirement, Hyper-V Replica doesn't use any form of SAN replication, it just replicates your VMs.
I could point out that my Supermicro solution lets you toss Hyper-V on the DR site and thus provides the same level of service as Azure for a significantly lower cost. None of that, however, matters. We'd just spiral down into circular arguments until time ends.
Changing the rules
Hardware procurement is only ever going to occur through a trusted provider and that will only ever involve the brands and configurations that the preferred vendor is comfortable with. Azure, however, is "in the public cloud" and thus the usual procurement rules don't apply.
For those of you who complain about the use of the term "cloud" to explain concepts that have been around for decades, this, right here, is exactly why the marketing wonks did it. This situation is a great example of marketing working exactly as designed.
The raw cost of running the DR setup in question in Azure is $280k. The old procurement rules would require the use of specific hardware that is at least $300k. My solution comes in at $150k, and the licencing variability could make any one of those ultimately more – or less – expensive.
My solution doesn't use a preferred vendor's preferred vendor. It can never even be considered in this sort of situation, let alone have the licensing numbers run enough to know which solution might win out, and by how much. The public cloud, however, is just somehow different. New procurement rules can be put into place. Decades of politics and regulation bypassed because we have a new buzzword. The most economically rational solution won't be the one picked because that would require wading into the mess of red tape that is the old procurement rules.
All the old restrictions don't apply to the cloud, and that is the important lure that draws so many in.
Companies aren't drawn into the public cloud because they feel $500+ per year is a great value for a nearly useless, entry-level VM, or because the DR quote actually comes in below a managed colo solution. The cloud isn't attractive because companies actually value being locked in to a cloud vendor where they don't even own the equipment or data used to make their business go.
We can't even lie to ourselves and say that cloud computing removes the need for backups or disaster recovery planning, because that's demonstrably untrue. You still have to do it – and pay for it – even in the public cloud.
No, despite all the marketing and the chest beating, unless you have fully modern designed-for-the-public-cloud burstable workloads, the public cloud is rarely cheaper than local gear. Even with the most obviously cloud-friendly workload - disaster recovery - it's only a clear win in circumstances where the most expensive possible local equipment was chosen.
When people get this uppity, we outsource their jobs to an Asian nation. When the rules that people impose on computers drive up the cost – and the frustration level – of sourcing local equipment, we outsource those computers to the cloud.
The public cloud is attractive because it is – for the time being at least – a shortcut around bureaucracy. It may seem illogical or even irrational to many of us, but it will continue to sell. ®