Original URL: https://www.theregister.co.uk/2010/06/16/data_dung/
Managing the dung heaps of data
Time to rationalise?
Lab The question “What about the 'I' in 'IT'?” serves as an equally good reminder of the point of the ‘T’. But while information allegedly exists to support the business, from the information technology perspective it sometimes seems almost a by-product of all the communications we want to do, all the applications we want to run.
Looking at all the disparate pools of information we end up with, it begs the question of whether we’ve got it backwards. We talk in terms of relational databases, spreadsheets, analytical systems, enterprise content management, email, file stores and so on, without necessarily linking them back to the business. As a result islands of data exist across the organisation, each running on their own hardware and, for larger companies, each managed by a different set of people.
It may feel like all such capabilities have been around forever. The fact is, of course, that these are merely the repositories that organisations settled upon at various points in IT’s history. Data management can be complex, so by necessity each store has fostered a set of skills and best practices, reinforcing the idea that each repository type is “the way things should be”. Over time as well, the people managing each can become understandably protective of their domains.
Meanwhile, business needs and change events roll on. From the IT department’s perspective, technology is a project-by-project thing, with new requirements and end-of-life events driving deployment initiatives. At the same time, other functions are not alien to deploying IT systems of their own – sales organisations deploying Blackberry Enterprise Servers, for example, or indeed tech-savvy managers knocking up an Access database at the weekend, which ends up still in use three years later.
No wonder then that we see database sprawl, fragmentation of data sources, proliferation of repositories and so on, each of which can impede the business from exploiting data more effectively. From the end-user perspective, this is manifested in the difficulty of actually being able to find the right information. We’ve been researching this area for several years now, and every time we do, “finding stuff” comes top of the list of data-related issues.
Both business users and IT people live with such consequences with a certain grumbling acceptance – we all do – that is, until the situation becomes untenable. Such factors include when things stop working completely, or - equally likely - when external factors come into play, such as a new set of regulations to be addressed (PCI, anyone?) or some strategic rationalisation initiative to sort things out.
One thing’s clear: while it is possible to improve the way things are, it is a pointless to start off expecting to deal with everything at once. Rationalisation of data sources, applications or both sounds great in principle, but near-impossible in practice due to the sheer scale of the job.
An important step, therefore, is to decide what it is worth spending the time on rationalising, consolidating or migrating. This is largely a case of mapping the business value of specific repositories, or the applications that use them, against the amount of effort actually required to maintain them. A legacy database, for example, may be mission critical, minding its own business and presenting little operational overhead. Here, the fundamental IT law of “if it ain’t broke, don’t fix it” applies. But equally, low-value data sources (perhaps containing information relating to products that no longer exist) might be ripe for decommissioning, however recently they were rolled out.
Data rationalisation exercises are never to be undertaken lightly, despite what vendors might say about how good their migration tools or middleware products might be. It is worth keeping the following points in mind during any such effort:
- It is as much about dependencies as data – between users, applications and repositories, physical and logical. Ignore these at your peril!
- Context is king – data entries by themselves may make little sense, and the value may lie across multiple data sources that need to be considered together.
- Don’t attempt to boil the ocean. While this is common sense, prioritisation exercises can unravel over time to avoid scope creep.
- Plan to fail – or at least be sure that you know what to do if something goes wrong. Allow sufficient time for testing, and build in fall-back paths wherever possible. There are no absolutes, particularly not in rationalisation and consolidation. While data sources may have accumulated in a haphazard fashion, approaches to resolving them should be anything but. Perhaps one day we’ll arrive at a point where some kind of master-repository exists that can take care of itself. Until then however, it will largely be a question of managing the fallout. ®