Where do you store master data?
Questioning the hub-based approach
Comment At IBM's recent Information on Demand conference (which was excellent, incidentally) the company presented its view of master data management (MDM). I am glad to say that this has advanced significantly since its Barcelona conference in May and the company has now recognised that you need to take a flexible approach to MDM.
The company had already appreciated that MDM needs to be treated holistically rather than as siloed solutions but it has now realised that different companies want to implement MDM for a variety of different reasons.
In Bloor Research's report on MDM, we defined three such categories: analytical MDM, whereby the emphasis is on understanding customers, products, suppliers and so forth; synchronisation, where the focus is on enabling data flow between applications based on unified entity definitions; and operational MDM, where these definitions are to be used as an SOA foundation for introducing new functional capabilities. Of course, some companies may have more than these business drivers underpinning their use of MDM.
IBM has now adopted a similar model although it refers to analytical, operational and collaborative MDM, where the last of these is about promoting collaborative authoring environments and it uses "operational" as a term where we would use "synchronisation".
Alongside this more flexible understanding, the company is also now more aware of the fact that if you are not going to do analytics against your master data then you are unlikely to need a hub-based approach. As a result, IBM is also now being more proactive in explaining how you can use its solutions within a registry or repository-based environment.
So, good marks all round for IBM.
However, this brings me to the title of this article. I suspect that there is a sort of assumption that all master data will be stored within your data warehouse. This view has been fostered by the hub-based approach espoused by the likes of Oracle, SAP and still, to a certain extent, by IBM. But does this approach make sense?
Clearly, if you want to calculate customer lifetime value, for example, then it makes sense to hold the relevant master data in your warehouse, because this is exactly the sort of analytic function for which it was designed. But does this still apply if you only want one of the other styles of MDM? In this case, the only sort of queries you are going to be running against the master data is look-up queries. Moreover, you are probably going to be running a lot of such queries. Is the warehouse the right place to support such functionality?
I am inclined to think that the answer to this question is no. It may be convenient to put master data in the warehouse but I am not sure that this is the most efficient or cost effective way to do this: wouldn't it be better to have a dedicated database optimised for this purpose? Further, if that is a reasonable proposition then, in a scenario that combines analytical MDM with either or both of the other approaches, would it be better to still have a separate MDM server and then replicate that data into the warehouse (or federate it) for analysis rather than simply relying on the warehouse?
I am not saying that I know the answers to these questions but I don't think that this is an issue that has been much discussed, and it needs to be.
Copyright © 2006, IT-Analysis.com
Sponsored: Hyper-scale data management