Data retention, compliance and the storage budget
Which one do you blow first?
Experts Compliance. Was there ever a word to strike such terror into the heart of the average techie? (OK, “Audit”. But don’t blame us, we didn’t want to say it…)
Juggling the often conflicting requirements of your budget and compliance is enough to give anyone a headache. So help us out with a question, if you would be so good.
Email, as you know, clogs up your storage boxes like nothing else. And if your policy is “hang on to it, you never know when you might need it” you could find yourself buying storage like it is going out of fashion. But is any other policy safe?
To sum up: Data retention is an increasingly complicated area. How do you make sure you are covered, without blowing your storage budget?
If you have some useful thoughts, please share them in the comments. If you don’t have any ideas, perhaps you’ll vote for the comments you think are best. We’ll be in touch with the “winner” to get a more in depth view.
Think you can help? Get thee to the comments…
Am I the only one thinking that pastebins are the future?
Agree with most of these comments. Specific requirements depend on your location, and business area.
For us archiving was as much a way of storing business information for posterity. We have used email for less than ten years and as a 180 year old historic institution we are very interested in maintaining a record for future access.
Others here in the states try to reduce the term of storage in line with policy to limit liability (to limit the cost of potential legal discovery requirements). But we just have it all.
When we moved to a SaaS vendor, the required extra storage only costs an extra $25/mo. ...We are a small company!
I will say that not all SaaS archiving services are alike. Our first try was a disaster - which I bailed out of to a larger vendor with competitve pricing. It gets us top notch spam filtering as well. I couldn't be happier.
First decide "what", then "how"...
Agree with comments on getting a policy right first - but make certain that you specify sufficient granularity that there is NO UNCERTAINTY about "what" you have to keep (and for how long).
Interesting hacks, which are often overlooked delves into the treatment of "drafts" and other "not final" work. Get yourself some Legal advice - you may find that your specific retention obligation may only be for the FINAL database, report, etc. and trashing the interim work product, in addition to reducing your storage requirements, actually reduces your "legal risk". (YMMV).
Oh, periodically its worth going back and making certain that folks are being space constrained and only keeping EXACTLY what is required. There is a natural tendency for folks to cover their asses and "packrat" everything. Incentivise folks to manage their storage allocations by making public the storage amounts and make a fuss of those who improve the most...
do it right the first time
With a written retention policy, and a simple email archive solution, you will use less storage than you ever have. To top it all off, your email databases will work and backup faster with this setup.
Data retention, compliance and archiving are often the forgotten areas as they're following the same principle as with backup. Noone cares about it until they need to access/restore something from somewhen.
Often the 3 topics are driven by business requirements, and thus when it hits the IT infrastructure support groups it's already segmented into different solutions driven by specific application requirements. The result is multiple unaligned tools & infrastructure components that become unmanageable & costly as they grow larger & older.
On top comes constantly increased compliance requirements.
The way to handle it is first of all to define, top down, a company policy/directive about what has to be kept, how & for how long, but just as important : What has to be deleted, how & at what age. In the end this is Information Lifecycle Management (ILM).
To define a proper ILM policy it is required to define and enforce data classification.
When the ILM policy/directive is defined and the data classification is there also, the next step is the process.
If no process is defined, each business unit will interpret the policy their own way and define a process that fits them, thus again ending up with a zoo of solutions in the IT infrastructure support.
The key here is to make the choice easy for those having to follow it. If they can chose only one way to do it, the choice is fairly easy.
When the process is defined, the next step is the technology. There are many tools out there. Some are purely software based and some are more deeply integrated with hardware solutions.
Depending on what compliancy rules there are, one often have limited options on what software & hardware will fit.
Application integration is also tricky here as there are few standards for such. XAM being one of the few common ones, however with the right policy & process in place and with proper data classification, often a simpler infrastructure setup can be utilized as the metadata definition has been covered by following the policy & process.
The trend is that storage solutions in this space moves towards scale-out object based platforms with alot of logic built into the solution and having common supported APIs (NFS, CIFS, WebDAV / REST, XAM). Alot of data classification & compliance can be defined when setting up each data container (how many copies to keep and geo-dispersed or not, how long to keep etc.).
It is key to classify the expected data before generating the data. If one start to generate data first and then try to classify later on, it's typically too late.
What keeps the cost & mangement under control is the ILM directive/policy combined with the data classification & enforced process to follow. This way it's ensured that only the data that needs to be kept is kept, and all the rest is deleted after a pre-defined grace period. Typically such an enforced deletion policy will get rid of > 80% of the data amount.
What kills the storage budget in relation to compliance & data retention is typically the reality of not knowing what data is important and what is not and thus ending up having to keep it all, just in case.
On top, keeing data that didn't need to be kept, could even be a risk and an audit finding.