Spot the joints: You say backup, I say archiving

Think of it as spectrum of data availability

Gartner critical capabilities for enterprise endpoint backup

If you have ever been asked to recover an old, lost or deleted file, you will know just how hard people find it to tell the difference between backup and archiving. The administrator's workload has grown so much that backup companies have even added user self-service portals to ease it.

The problem has accentuated as companies have moved the backup process off tape and onto disk-based arrays and appliances to get faster backups and restores. After all, modern disk-to-disk backup appliances look remarkably similar to the sort of disk arrays typically used now for secondary storage.

But there are a lot of challenges and problems associated with using a backup as an archive. One of the most obvious is how to find stuff: once you have run backups for three or four years, finding stuff is going to be impossible unless you have strong indexing and data management tools.

Using your backup as if it were an archive is undoubtedly inefficient, but so is operating the two as independent systems, each with its own hardware and software.

Points on a spectrum

The best approach is to treat the two simply as separate points along a spectrum of data availability – after all, they often use much the same underlying hardware. The main differences lie in the roles they play.

Increasingly, that underlying hardware is more alike than we might think, says Steve Mackey, vice president international at Spectra Logic, a tape library vendor which a few years ago pivoted away from backup and towards archiving.

“We used to design tape libraries primarily for backup and then repurpose them for archiving. Now they are primarily designed for archiving,” Mackey says.

“It's all about integrity of data, the quality of the media, recoverability and so on. Every archiving system also has a disk cache. Archiving involves multiple different technologies.”

Briefly, a backup is usually a secondary copy of primary data for system recovery, while an archive is the primary copy of that archived data stored on cheaper and lower-performance hardware, whether locally or in the cloud.

Indeed, you might very well need to back up your archive, suggests Frank Reichart, senior director product marketing storage at Fujitsu Technology Solutions.

“Typically we see three problems. The first is that a lot of users still don't see the difference and are treating backup as an archive,” he says.

“That is a very inefficient way to work, as there are typically multiple copies of the same data in backup processes (daily, weekly, monthly backups and so on), and also different versions depending on the time backups are made. Archived data needs to exist only once and in one final version.

“The second thing is retrieval. Backup is very poor for getting specific data back.”

Class divisions

One option is to have a single integrated and unified data protection appliance that can provide both functionalities.

“An intelligent backup appliance can keep backup and archiving logically separate while converging the hardware,” Reichart says.

"Intelligent data protection appliances can share the hardware"

“If people understand the difference and implement good backup and archiving, they typically have separate storage for each. That is not necessary with intelligent data protection appliances though, as they can share the hardware and have common services.”

There is even an emerging division within archiving, with two different classes of archive each needing different service levels, according to Bob Plumridge, the chair of storage industry group SNIA-Europe.

“You need to ask why you are archiving. Is it for compliance? For business reasons? Or just because you are keeping everything for safety's sake? People used to archive to tape because they thought they would never touch that data again,” he says.

“There are products that will scan your backups for age or rate of change – for example, should this element be in an archive instead because it hasn't changed in months? There are also a lot more online archives, where data is protected and immediately accessible, but not in your backup cycle. It's a third group of data, a lot of which has come about through regulatory changes.”

Instant gratification

Steve Mackey agrees. “Content is also an archive. We would describe it as an active archive, with people constantly accessing it. For example, in big broadcast archives you might need access at short notice to news segments to use in an obituary, say. When you need it, you need it fast, but you don't know when that will be,” he says.

“The other kind of archive is stuff you want to keep but you don't know if you will need to access it or not. You still want it secure, often because it's regulated, and you need the ability to prove deletion, for instance. You might need to recover it for an audit, but there is no requirement for regular access.”

It is important to keep in mind that these are just more points on that data storage spectrum. They are separate use-cases perhaps, but ideally they need to be part of the same integrated and co-ordinated management framework. They all use the same underlying technology, whether that be disk and tape arrays or de-duplication and data compression software.

If it sounds familiar, that is not too surprising: this is pretty much what hierarchical storage management promised back in the 1980s and 90s. The same ideas were subsequently repackaged a decade later, first as part of the information lifecycle management (ILM) concept.

And then when the ILM name became tainted by some high-profile project failures, they reappeared as storage tiering. Since then they have gradually worked their way into pretty much every serious storage management software or storage subsystem.

Valuable finds

The ability to use the same disk-to-disk appliance for both backup and archiving could pay off in other ways as companies realise the value that still exists in their older data.

“If you think about big-data analytics, most is focused on real-time analysis. But in the future it could be more on what can we do with the last few years' data,” says Plumridge.

“So more and more organisations are looking at online archiving and not to tape. These are not particularly huge archives today but it will be interesting to see what happens when they get to multiple petabytes. Could some of them move to tape?

“One other aspect to consider is when the archive is not disused data as such, but is actually a working content store – it just happens to be older content that has been moved off the primary systems and storage."

It seems clear though that for most organisations there are opportunities for optimisation when it comes to the overlaps between backup and that other kind of long-term “for safety's sake” storage.

Whether you call it deep archiving, ILM or storage tiering, and whether you store it on disk, tape or both, it seems there are significant storage savings to be made. ®

Secure remote control for conventional and virtual desktops

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Microsoft: Azure isn't ready for biz-critical apps … yet
Microsoft will move its own IT to the cloud to avoid $200m server bill
US regulators OK sale of IBM's x86 server biz to Lenovo
Now all that remains is for gov't offices to ban the boxes
Flash could be CHEAPER than SAS DISK? Come off it, NetApp
Stats analysis reckons we'll hit that point in just three years
Oracle reveals 32-core, 10 BEEELLION-transistor SPARC M7
New chip scales to 1024 cores, 8192 threads 64 TB RAM, at speeds over 3.6GHz
Object storage bods Exablox: RAID is dead, baby. RAID is dead
Bring your own disks to its object appliances
Nimble's latest mutants GORGE themselves on unlucky forerunners
Crossing Sandy Bridges without stopping for breath
prev story


Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 10 endpoint backup mistakes
Avoid the ten endpoint backup mistakes to ensure that your critical corporate data is protected and end user productivity is improved.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.