Spot the joints: You say backup, I say archiving

Think of it as spectrum of data availability

Combat fraud and increase customer satisfaction

If you have ever been asked to recover an old, lost or deleted file, you will know just how hard people find it to tell the difference between backup and archiving. The administrator's workload has grown so much that backup companies have even added user self-service portals to ease it.

The problem has accentuated as companies have moved the backup process off tape and onto disk-based arrays and appliances to get faster backups and restores. After all, modern disk-to-disk backup appliances look remarkably similar to the sort of disk arrays typically used now for secondary storage.

But there are a lot of challenges and problems associated with using a backup as an archive. One of the most obvious is how to find stuff: once you have run backups for three or four years, finding stuff is going to be impossible unless you have strong indexing and data management tools.

Using your backup as if it were an archive is undoubtedly inefficient, but so is operating the two as independent systems, each with its own hardware and software.

Points on a spectrum

The best approach is to treat the two simply as separate points along a spectrum of data availability – after all, they often use much the same underlying hardware. The main differences lie in the roles they play.

Increasingly, that underlying hardware is more alike than we might think, says Steve Mackey, vice president international at Spectra Logic, a tape library vendor which a few years ago pivoted away from backup and towards archiving.

“We used to design tape libraries primarily for backup and then repurpose them for archiving. Now they are primarily designed for archiving,” Mackey says.

“It's all about integrity of data, the quality of the media, recoverability and so on. Every archiving system also has a disk cache. Archiving involves multiple different technologies.”

Briefly, a backup is usually a secondary copy of primary data for system recovery, while an archive is the primary copy of that archived data stored on cheaper and lower-performance hardware, whether locally or in the cloud.

Indeed, you might very well need to back up your archive, suggests Frank Reichart, senior director product marketing storage at Fujitsu Technology Solutions.

“Typically we see three problems. The first is that a lot of users still don't see the difference and are treating backup as an archive,” he says.

“That is a very inefficient way to work, as there are typically multiple copies of the same data in backup processes (daily, weekly, monthly backups and so on), and also different versions depending on the time backups are made. Archived data needs to exist only once and in one final version.

“The second thing is retrieval. Backup is very poor for getting specific data back.”

Class divisions

One option is to have a single integrated and unified data protection appliance that can provide both functionalities.

“An intelligent backup appliance can keep backup and archiving logically separate while converging the hardware,” Reichart says.

"Intelligent data protection appliances can share the hardware"

“If people understand the difference and implement good backup and archiving, they typically have separate storage for each. That is not necessary with intelligent data protection appliances though, as they can share the hardware and have common services.”

There is even an emerging division within archiving, with two different classes of archive each needing different service levels, according to Bob Plumridge, the chair of storage industry group SNIA-Europe.

“You need to ask why you are archiving. Is it for compliance? For business reasons? Or just because you are keeping everything for safety's sake? People used to archive to tape because they thought they would never touch that data again,” he says.

“There are products that will scan your backups for age or rate of change – for example, should this element be in an archive instead because it hasn't changed in months? There are also a lot more online archives, where data is protected and immediately accessible, but not in your backup cycle. It's a third group of data, a lot of which has come about through regulatory changes.”

Instant gratification

Steve Mackey agrees. “Content is also an archive. We would describe it as an active archive, with people constantly accessing it. For example, in big broadcast archives you might need access at short notice to news segments to use in an obituary, say. When you need it, you need it fast, but you don't know when that will be,” he says.

“The other kind of archive is stuff you want to keep but you don't know if you will need to access it or not. You still want it secure, often because it's regulated, and you need the ability to prove deletion, for instance. You might need to recover it for an audit, but there is no requirement for regular access.”

It is important to keep in mind that these are just more points on that data storage spectrum. They are separate use-cases perhaps, but ideally they need to be part of the same integrated and co-ordinated management framework. They all use the same underlying technology, whether that be disk and tape arrays or de-duplication and data compression software.

If it sounds familiar, that is not too surprising: this is pretty much what hierarchical storage management promised back in the 1980s and 90s. The same ideas were subsequently repackaged a decade later, first as part of the information lifecycle management (ILM) concept.

And then when the ILM name became tainted by some high-profile project failures, they reappeared as storage tiering. Since then they have gradually worked their way into pretty much every serious storage management software or storage subsystem.

Valuable finds

The ability to use the same disk-to-disk appliance for both backup and archiving could pay off in other ways as companies realise the value that still exists in their older data.

“If you think about big-data analytics, most is focused on real-time analysis. But in the future it could be more on what can we do with the last few years' data,” says Plumridge.

“So more and more organisations are looking at online archiving and not to tape. These are not particularly huge archives today but it will be interesting to see what happens when they get to multiple petabytes. Could some of them move to tape?

“One other aspect to consider is when the archive is not disused data as such, but is actually a working content store – it just happens to be older content that has been moved off the primary systems and storage."

It seems clear though that for most organisations there are opportunities for optimisation when it comes to the overlaps between backup and that other kind of long-term “for safety's sake” storage.

Whether you call it deep archiving, ILM or storage tiering, and whether you store it on disk, tape or both, it seems there are significant storage savings to be made. ®

3 Big data security analytics techniques

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Kingston DataTraveler MicroDuo: Turn your phone into a 72GB beast
USB-usiness in the front, micro-USB party in the back
AMD's 'Seattle' 64-bit ARM server chips now sampling, set to launch in late 2014
But they won't appear in SeaMicro Fabric Compute Systems anytime soon
Brit boffins use TARDIS to re-route data flows through time and space
'Traffic Assignment and Retiming Dynamics with Inherent Stability' algo can save ISPs big bucks
Microsoft's Nadella: SQL Server 2014 means we're all about data
Adds new big data tools in quest for 'ambient intelligence'
prev story


Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.