Original URL: https://www.theregister.com/2013/09/23/data_backup_column/

How many apps does it take to back up your data?

Trevor Pott counts the ways

By Trevor Pott and Iain Thomson

Posted in Storage, 23rd September 2013 17:00 GMT

Sysadmin blog What is the better approach to backups: a single service that can back up everything on your network or a collection of applications for backing up different items?

Over the course of my career I have been on both sides of this argument and I am still not convinced either is right. Now a pending network upgrade has forced me to scrap a recent backup plan and reopen the old debate.

The more the merrier

One of the simplest reasons to look towards multiple backup applications – or multiple instances of the same application – is restore speed.

I have had to do a few full-bore disaster recovery events: they aren't pretty and they aren't fun. The process is nerve-wracking for everyone. Will the backups work? Was anything lost? What was lost? How much will this cost?

As well as the human side of the equation there is the more pragmatic time-is-money maths. Our data centres are increasingly interdependent. Restoring data for one application doesn't mean you can use it; often you need to get many more applications up and running – or even all of them – before the critical bit everyone is waiting for will function.

Some backup providers grok this, others don't, but in my experience no single application is as fast as a restore from multiple points simultaneously.

How will I know?

The number of applications we use every day is growing. We have only to look at our own homes. We have moved from a PC with a handful of apps to everyone having their own mobile device each with its own unique app loadout.

How can one backup application be expected to know about all the applications in our data centres or all of the cloudy SaaS apps we are increasingly dependent upon?

A backup regime that can't back up all your data is kind of pointless, and I fear that it is increasingly unrealistic to expect any single application to do backups for every app in use.

One thing that single backup applications frequently lack is the ability to treat different classes of data with different priorities. Every backup application I have worked with for the past 10 years or so has had the ability to do backups with differing frequency depending on source. Few can do it based on data content; that still requires scripting.

I look for more: the ability to choose the backup medium (or destination) based on data source, content, document ownership and so on; automatic duplication of some categories of data to multiple destinations; and in some cases data is so mission critical that I require an unencrypted, un-deduplicated copy ready for immediate launch in a cold standby facility.

In practice, this has meant using multiple applications just to get the feature coverage I seek.

Trust me, I’m a provider

A large part of the argument for multiple backup applications revolves around trust. Using a single backup provider means trusting that application or its vendor to be there for you when you need them. I have seen all sorts of things go sideways during restores and it leaves me very leery of backup providers in general.

What happens if the backup manifest is corrupted? Does the application have a means to rebuild it? Do you know how to do it on your own and if not would the vendor help?

If you use an online provider as the backup destination how flexible would it be? If part of your disaster involves the loss of your high-bandwidth internet connection will the company freight you disks? Will it do it without charging you the GDP of Ghana?

A single backup provider has the potential to be a single point of failure. I have certainly been in the situation where the vendor refused to support an older version of its software. The backup software naturally went sideways during restore and the lack of support was infuriating.

What happens if the only guy in the company who knows how to make the thing work gets hit by a bus?

Similarly, I can't imagine trusting all of my backups to a single online provider. It is only where a single provider is providing both local software – or a local appliance – in combination with an online service that I could see a it moving past that single point of failure.

In this case, trust would depend entirely on reputation. How long has the provider been in business? Are backups core to its offering or just a transient sideline that it will throw overboard when it doesn't demonstrate year-on-year growth far exceeding analysts’ expectations?

Another consideration that pushes me towards multiple applications is centred on the trust we may or may not have in the sysadmins overseeing the process. Backup applications are complex and frequently difficult to use.

What happens if the only guy in the company who knows how to make the thing work gets hit by a bus? Are you certain that knowledge has been passed to others?

Multiple applications at least give the opportunity to make several individuals (or departments) responsible for backup applications. If you have a few people used to their own backup applications, they can usually put their heads together and figure out how a different one works.

Go it alone

The strongest argument against multiple backup applications and for a unified approach is that purchasing, maintaining and operating them is a significant operational expense. As the market matures, backup providers increasingly seek to differentiate themselves by novel pricing schemes. This makes trying to optimise backup providers for a single infrastructure a nightmare of Redmondian proportions.

The meatbags required to poke the buttons are costly too. That redundant array of backup nerds isn't feasible for smaller businesses, and even large enterprises aren't likely to be quite so paranoid about their backups.

If you are reduced to one or two backup admins, then keeping track of various backup applications can be difficult. I use a feeder approach for most of my setups. Linux systems will often back up their applications and data via cron jobs to some centralised shared storage.

Many Windows systems with touchy or niche applications will do the same. The centralised application then vacuums this all up as part of its storage run.

Spreading tentacles

The whole approach is a terrifying tentacled monster of a scale that only 4chan could have dreamt up. I can only really get away with it because I have several admins who understand how it works and can reverse engineer it if I go missing. A single application that could back it all up would be a lot easier and quite likely more reliable.

Another real consideration is whether or not your backups are compliant with the various data protection laws and policies that you need to follow.

The rules seem to multiply quickly and may already be too complicated for part-time admins to keep track of. A decent backup provider will be on top of this as it is a great way to set itself apart. It will have legal experts to decode and interpret the laws and project managers to turn that into something that developers can code.

A patchwork of overlapping applications is far more likely to handle data in a non-compliant manner than a single application from a vendor which has committed itself to security.

Virtualisation to the rescue

The issues surrounding application proliferation and application-specific backups are becoming less important as virtualisation takes hold. If we cannot back up the individual application, chances are that we can simply back up the entire operating system it lives in.

While this is a bit like swatting flies with a nuke, other technologies such as deduplication are stepping in to make space issues less of a problem.

Virtualisation also makes continuous data protection a far more realistic goal. The idea is that every bit written by a system would be backed up in (or nearly) real time. In a private-cloud environment I don't see why this isn't doable: real-time virtual machine replication is something all major virtualisation vendors are working on.

If that can't be hijacked for your backup needs, most virtualisation setups use centralised storage anyway, so you can simply mirror bits from there. Either way, you direct a copy of the bitstream to a backup appliance (physical or virtual) within your data centre, deduplicate it and fire it off to a target for storage. That target can be local, it can be in the cloud or both. But to which cloud?

Leave it to the big boys

Here we are deep into “I don't even want to imagine what designing this out of a patchwork of backup applications would look like”. Unfortunately, this is exactly the sort of conundrum I have to focus on.

Admin jobs in smaller businesses are evaporating; we are transitioning from sysadmins in charge of a single company's IT to managed service providers handling multiple companies. I am lucky to have a Canadian company, Asigra, that specialises in this.

It offers software I can put in my own data centre to serve as the backup target for my clients. I don't have to worry about the legalities of dealing with extra-territorial laws. I am not comfortable backing up my clients' data to another country and I flat out don't have the time to design my own backup regime from scratch.

What about you, dear readers? This article has supplied broad generalisations and I am interested to hear how you approach backups. Do you use multiple applications or a single one? Do you use cloud backups at all or are you building your own clouds to serve as the destination for your clients?

Answers in the comments, please. ®