Time to put 'Big Data' on a forced diet
There ain't nothing cheap about big storage
Storage: What sort do I need?
It's no surprise if your high-end database applications need high-speed SAN storage in order to ensure they perform adequately. What's interesting is that even today it's rare to see a software product's data sheet cite the IOPS (per-second storage operation capacity) requirement of the product. Just recently I was reading the spec of a software product and did a double-take at the fact that it actually cited an IOPS figure.
The point is that loads of your applications and users won't need super-fast disk. You can mix your storage infrastructure to match your storage requirements: so you may have SATA disks for your less onerous systems and Fibre Channel for the heavy stuff, with SAN-connected storage for heavy processing and iSCSI or even NAS-style (NFS or CIFS) presentation for lighter loads. By choosing your storage wisely, even if you don't manage to reduce the space you decide you need to buy, you can at least lop a zero off the price tag of some areas of it.
Data: Come on, spit it out: do you really need it all?
At the end of the day, though, you're always going to end up at this question and if you're being honest you're always going to answer it: “No, of course not”.
Take my personal data collection, for example. I have a box of DVDs instead of the home office server and the vast raft of external hard disks I used to keep hanging around, as I realised that in fact I probably dig one thing per year out of the archive. Now look at the average business user (especially if they're a techie): it doesn't take that many graphics-heavy PowerPoints, downloaded ISO DVD images, backups of mail files and the like to soak up a few hundred gigabytes. Multiply this by a few hundred staff, and even with your de-duping and compression hammering away for all they're worth, your storage requirements will spiral out of control.
The thing with storage capacity management is you tend only to do something about it when you're panicking. Some organisations actively monitor and manage capacity, but most don't, which means that they only do anything about it when either (a) they get close to capacity and stuff starts slowing down, or (b) the overnight backup finally tips over the end of its window and they can't get <insert name of mission-critical system> back online before the start of business the next morning.
So be ruthless with your data. Of course you need to keep much of it: if you're a business then the law requires you to do so, and of course in order to actually do business you also need much of it. But review it frequently, and do something about it proactively.
If you don't need it online, store it offline and educate people how to get to it should they need it. If you don't really need it readily accessible but can't face binning it, archive it to tape and store the tapes safely.
But I'll bet that after you've done all this, you'll still have hundreds of gigabytes of stuff that you actually, genuinely, really don't need. So identify it, grin smugly to yourself and throw it away. ®
Dave Cartwright is a senior network and telecoms specialist who has spent 20 years working in academia, defence, publishing and intellectual property. He is the founding and technical editor of Network Week and Techworld and his specialities include design, construction and management of global telecoms networks, infrastructure and software architecture, development and testing, database design, implementation and optimization. Dave and his family live in St Helier on the island paradise of Jersey.
Sponsored: Hyper-scale data management