Data retention, FOI and the storage budget
Managing the archive
You the Expert We set you a challenge. How do you resolve the conflicting requirements of data retention rules, the Freedom of Information Act and managing a storage budget?
Email, as you know, clogs up your storage boxes like nothing else. And if your policy is “hang on to it, you never know when you might need it” you could find yourself buying storage like it is going out of fashion. But is any other policy safe?
To sum up: Data retention is an increasingly complicated area. How do you make sure you are covered, without blowing your storage budget?
You guys had some interesting things to say in the comments, and some unflattering things to say about some of your users. For shame. You should have them better trained, BOFH-style.
Suggestions included uploading the backlog of email on to the stockpiles of floppies that doubtless exist somewhere in the universe.
Rather more seriously, you suggested various method for keeping email volume down, moving everything into the cloud, and working out whether the storage you need would cost more than the fine for non compliance if you didn’t have it.
And so to our expert panel. Let’s begin with the voice from the audience.
Graeme Fowler – Reader Expert and Internet Plumbing Geek
Firstly, a disclaimer. I am not an expert in data retention, FOI, data protection and so on. I consider myself an “informed layman”, having run reasonably large mail systems since 1997 in several environments.
Backend storage grows and never ever shrinks. It costs money. As systems become more highly available, storage costs escalate – but the budget isn't always there to support it.
Vendors like to sell their products as the answer to every single “compliance” issue they can possibly imagine. Almost all purport to reduce the dependency on high-cost, high-performance, highly-available enterprise storage at the same time as ensuring your organisation is “compliant”.
But with what is it that you're expected to comply? And why?
Asking yourself those two questions is vital. You'll never achieve what you need without understanding your requirements first. That's basic project management but is something I've seen many people struggle over when presented with ShinyGadget Retention and Compliance Version 2.1; they end up led by the technology.
The answers depend entirely upon the environment in which you're operating. Financial institutions Education, Public Sector, Private Sector and so on all have different requirements. None are exclusive, some overlap, and some are contradictory.
In my current role in Higher Education we have three contradictory pressures: The Data Protection Act; The Freedom of Information Act; The cost of enterprise storage
To keep storage costs down, we apply mailbox quotas. That means items which are not only business-relevant but could be contractually sensitive or pertain to someone's education end up being deleted in user-driven mailbox culls; these should be retained. Keeping them means remembering the “only keep data which is pertinent and necessary” directive from the DPA, at the same time as remembering that FoI (as well as the DPA) dictates that if you've kept it, you might need to reveal it at some point in the future.
Then you need to decide what to keep:
Everything? Sure, but there are endless storage or service costs attached to that (not to mention what happens if the retention system blows up). And how long do you keep it, knowing it's a ticking DPA and FoIA time-bomb?
Just the “important” things? Great idea, but who decides what is importan? You need to understand your statutory obligations, industry best practice, and internal business policies, and how they differ across sections of the organisation.
Only when you've covered that will you be in a position to talk to the vendors who are after your business. You will understand – perhaps not exactly, but better than when you started – what and why you need to retain (and for how long).
That gives a pretty decent specification for a procurement exercise. It also means you've got a broad insight into how your organisation is managing the data that sloshes from side to side within your email system.
Start with the non-technical questions; they'll lead to the technical ones. They'll also make the technical ones far easier to understand. In turn, you might be able to make that budget we mentioned up there a tiny bit more manageable.
Dale Vile, MD Freeform Dynamics
The data retention policy in too many organisations has historically boiled down to ‘keep everything forever’, or something very close to it. As disks proliferate and backup tapes pile up though, such an approach is not sustainable over the longer term. In fact, some are already struggling, and even those that have taken a more discerning approach are being challenged by the rate at which unstructured data in particular is growing.
So what can be done? Throwing yet more storage at the problem may alleviate immediate symptoms but isn’t the basis for a longer term fix. Apart from the cost of additional capacity and the space required to accommodate new storage devices in the data centre or computer room, as requirements continue to escalate, more effort is required to manage everything. The choice then becomes to grow the data management team or risk losing control, neither of which is desirable in today's environment.
There are some ideas and approaches that can help, however. These are based on the principles of implementing a more selective retention regime, and making sure that data that is retained is stored as efficiently as possible:
Data classification: The principle here is that not all data is equal in terms of importance and value, and by distinguishing between different classes or categories, it is possible to develop more objective and selective retention policies.
With the classification approach, you can define the amount of time you keep certain types of documents, transaction data, and so on, which allows you to get rid of data once the prerequisite time has passed. How you classify depends on your business, and it is not always necessary to be exhaustive. A lot can be achieved, for example, by simply identifying data that does not need to be kept at all, or which may be discarded after a short period of time.
Document versioning: Sometimes there is a need to retain all versions of a document through the various stages of drafting, review, approval and subsequent revision. This may be the case in highly regulated environments, for example.
Often, however, all that really needs to be kept is the final version. Similarly, the need to hang on to correspondence and copies of other forms of communication leading up to a final document or transaction will vary immensely depending on the industry and specific scenario. By understanding these differences and putting appropriate policies in place, backed up by solutions such as workflow and document management systems, the volume of data to be stored can again be reduced.
Storage optimisation: When it is necessary to retain data, you want to make sure this is done efficiently. Techniques such as de-duplication can dramatically reduce storage requirements, e.g. by preventing a document that was circulated as an attachment to multiple email recipients being stored multiple times in an email archive.
There is then storage tiering, which is based on the principle of saving cost by holding data on media and devices that are 'good enough', but no more. Frequently accessed critical data may be stored on high performance resilient disc at one end of the spectrum, for example, with persistent historical data that is accessed infrequently put on cheap commodity storage, tape or even uploaded to cloud storage at the other.
Solutions in the areas we have discussed will often help to reduce management overhead. Modern content management and workflow systems allow automation of policy implementation, and the latest storage management software will simplify administration through virtualisation techniques and auto distribution/migration of data to the most appropriate location (e.g. tier or device).
It is also worth bearing in mind that getting your act together on storage will not only reduce cost and risk, but increase the chances of users actually being able find the data they need to make business decisions, so benefits extend well beyond the IT department.
Simon May, technical evangelist, MIcrosoft
Getting data retention right is one of the most complicated and thankless tasks any IT professional has to deal with. It’s a fine balancing act, get it right and you have enough data to get by and fulfill requirements but keep the data too long or without adequate controls and you can be in a world of pain, similarly so if you don’t keep enough data.
Every industry has its own requirements around data retention periods, some healthcare records need to be stored for up to 25 years, some for no more than six years, finance, child protection, military, pharma and elsewhere have unique requirements including permanent retention. Laying those requirements to one side because of their complex and unique nature let’s think about the technology we have at our disposal.
When I think of storing data I think about storing email because the vast majority of information workers today store huge quantities of data in their email inboxes. I personally use it as my single, searchable, storage source and ensure most of my documentation transits through there making it easy to locate.
Most people who have used Microsoft Exchange (or indeed any email system) will doubtless be aware of archiving because they normally don’t have limitless storage available to them. It’s not uncommon for quota as low as 100mb to be enforced in some organisations and this can rapidly lead to fragmentation of email storage, proliferation of .PST files and personal archives. The worst part of this is a disjointed experience for the user where they have to keep grabbing a piece of their history from a file here and a file there.
Storage costs have been driven down for years due to the proliferation of data and cloud services mean that costs can be even lower. Cloud email services have driven a shift in user perception of email storage along with the trend of consumerisation creating and expection of near limitless email storage leading to disappointment and rage when those expections aren’t met. Even given the economonics in some organisations I’ve heard remarks such as “but my music player has more storage than the email server”, as a professional you might think that naïve but it’s understandable from a user perspective.
There is a lot to think about with regards to archival of email but it usually incurs costs that the business doesn’t like to think about because it rarely generates revenue. Cloud services offer quite a clever way of sidestepping some of the costs because the infrastructure is managed for you. Office 365 offers the Exchange Online Archiving Service which can archive an on-premises Exchange 2010 infrastructure to the cloud.
To put some numbers around that: by default every Exchange Online Plan 2 user gets 100GB of personal archive space – how much would that cost you to provide on-premises to all your users? There are additional concerns, other than purely storage space, that cloud services can help with too. Providers have facilities that are independently audited , have intrusion monitoring and provide secure access (SSL and TLS encrypted) and again whilst this is nothing you cannot do on-premises it can just be easier and cheaper in the public cloud.
If you’re interested you can find out more by reading the Microsoft Exchange Online Archiving Service description..