MDX: Why it will matter to application developers
That's Multi-Dimensional eXpressions to you, guv
Since when did application developers need to know about an analytical querying language like MDX? Since February 27th, next year. That’s when Microsoft will launch SQL Server 2008. Of course, you’ll still have breathing space because, as we are painfully aware, launching and actually delivering products are, in Microsoft’s world, now completely different events…..
Whenever it deigns to appear, the arrival of SQL Server 2008 will change the landscape for application developers because Business Intelligence (BI) is set to expand from the analytical world into the transactional. As that happens, MDX will become a highly saleable skill for application developers.
From the 1980s onwards, as relational databases came into use for storing transactional data, so application developers were delighted/forced to learn SQL in order to insert and extract data. Moving into the 1990s, businesses learned that there was massively valuable information (as opposed to raw data) tied up in transactional systems but it was difficult to identify and use.
So copies of the data were pulled out of the transactional systems, munged with data from other operational systems and stuffed into a data warehouse built to support analytical querying. (OK, so the technical term is ETL – Extract, Transform and Load).
At some point in the warehouse system (often in the data marts) the data is restructured as a multi-dimensional structure, also referred to as an OLAP (On-Line Analytical Processing) cube. This restructuring is performed because we have learnt (the hard way) that analytical queries are very different from transactional ones. The radically different structure makes the querying much simpler - and faster - but we need a new analytical language to query it - MDX (Multi-Dimensional eXpressions).
For users of Microsoft tools, multi-dimensional data began with SQL Server 7.0 and was strengthened in the 2000 version. With 2005 Microsoft added pro-active caching which brought us a step closer to realising a real-time analytical data store. SQL Server 2008 will be the culmination of Microsoft’s long-term plan to integrate BI into the heart of the product. Analytical database developers use MDX to manipulate and extract information; however there has, until now, been no persuasive need for developers of transaction-based applications to learn it.
Third generation BI
As real-time analysis becomes progressively more feasible, a new class of use for these systems opens up. We’ve reached the stage where we can analyse not just what has happened in the past from records captured in transactional systems, but we can perform analyses that directly affect the business process. This is a critical point, encapsulating the characteristics what can be termed ‘third generation business intelligence’. Such applications take advantage of complex real-time analysis to feed back and affect business transactions as they occur.
Imagine a widget manufacturing plant that uses second generation BI. Each time a widget rolls off the production line it’s measured for 12 characteristics: length, weight, whatever. Results for each widget are entered into a transactional database. After they have been sold and used by customers, an analysis can be performed to see which widgets were returned by unhappy customers. That information could be used by the team responsible for customer satisfaction.
In a third generation system, we simply close the loop. We perform the analysis and identify the clusters in the returns (perhaps buyers consistently return widgets that are green, longer than 23mm but weigh less than 155g.). Now, as each new widget comes down the production line it is measured and an analytical query is run against the data warehouse to see if the widget is likely to be returned. If so, it is rejected and never enters the sales line.
The development of real-time analysis means that we will need a new breed of transactional application to interact with the data. These applications must be capable of running analytical queries which look at data in a transactional system and apply rules/guidelines that come from BI analyses. For the first time we can see a need for application developers to know about analytical querying - which means known about MDX (and possibly DMX, Data Mining eXtensions, for working with data mining models.)
Surely I can do this with the SQL I know and love?
It is essentially true that SQL and MDX are broadly inter-convertible, just like assembler and C++. We don’t have to use C++ but it is worth learning because it’s better suited to application development than assembler. SQL is ideal for asking questions of a transactional database such as “How many widgets did we make yesterday?” but to ask “Is this widget of acceptable quality when judged against the BI system’s findings about buyer returns?” is truly MDX/DMX territory, as it requires an ‘understanding’ of querying and aggregation across multiple dimensions.
For the application developer who wants to stay ahead, the imminent release of SQL Server 2008 means the time has come to learn about MDX, a skill likely to appeal to an increasing pool of prospective employers. Mindful of this, Reg Developer offers an introduction to MDX in a follow-up article here. ®