Original URL: http://www.theregister.co.uk/2007/10/23/mdx_hands_on/

Hands on with MDX

Dimension and measures (yes, but can it handle cubic furlongs?)

By Mark Whitehorn

Posted in Developer, 23rd October 2007 20:51 GMT

Following our introduction to MDX (to be found here) this follow-up article is a get-you-started guide to using this powerful language to manipulate multi-dimensional data.

The basics

Relational databases store data in two-dimensional tables, a familiar concept that mimics grids of data on paper. Multi-dimensional data is fundamentally different and is stored as dimensions and measures. Dimensions describe the data: examples are Products (the items we sell), Location (where sales take place) and Time. Measures are usually numeric, like Units Sold, Sales and Budget.

We can see the principle of multi-dimensional data in a very simple two-dimensional structure. Here there are two descriptive dimensions – Products in the rows and Time in the columns. There is one numerical measure, Units Sold.

Units Sold

Jan Feb Mar Apr
Apples 125 45 78 169
Figs 74 53 89 39
Lemons 20 21 25 14
Pears 220 74 103 156

But, unlike tables in a relational database, we aren’t limited to two dimensions. Add a third – Location - and we get a 3-D cube of data.

As before the rows show Products and the columns show Time. The slices that give the cube depth show the additional Location dimension. The values held at the intersections of all three dimensions are the Units Sold. (For simplicity only four values are shown). The green cell shows the number of lemons sold in Chicago in April.

Dimensions contain members and here each dimension has four: Chicago, London, Paris and New York are the four members in the Location dimension.

A multi-dimensional data structure can have many more dimensions, and this is where any visualisation technique breaks down. We humans live in a 3-D world and imagining further dimensions is hard: however, the cube analogy is a good starting point for understanding the principle behind multi-dimensional structures. (Not entirely coincidentally, multi-dimensional OLAP structures are commonly called cubes).

Hierarchies, levels and aggregations

Dimensions are often hierarchical which enables them to hold data at different levels. A Time dimension is almost invariably hierarchical and typical levels might be years, quarters and months.

Hierarchies usually have an All level at the top to give a grand total of all data, with levels below. In this case the Year level contains members for 2006 and 2007, each of these is divided into four quarters and each quarter into three months. (Not all the months are shown.) This hierarchy lets us look at the Units Sold measure for each time period: all fig sales in a particular year, quarter or month.

During the cube-building process, data for the lowest level, called the leaf level (monthly values here but could be weekly, daily, whatever) is combined to give the higher values, quarters combine into years, and years into All. These calculated values are called aggregations.

Yes, yes, but where’s the CODE?

OK, here we go. This MDX query finds the total of apple sales in London in 2007:

SELECT
{ [Products].[Apples] } ON COLUMNS,
{ [Location].[London] } ON ROWS,
FROM [FruitMDX]
WHERE [Time].[2007]

The syntax is undeniably reminiscent of SQL, using SELECT, FROM and WHERE and, like an SQL query it returns an answer. However, unlike SQL, the main use for MDX is not querying (useful as it can be for that). In Fast Track to MDX a book about using MDX in SQL server 2000, of the 21 chapters there are but two on queries. The rest cover MDX expressions.

An MDX expression is a partial statement, a small, highly adaptable and powerful code section. The power comes from the language’s ability to 'understand' multi-dimensional data, and this is a radical difference between querying relational databases using SQL and multi-dimensional databases using MDX. In relational theory (and indeed practice) rows of data are inherently unordered. The concepts of the 'next' row or the 'previous' row are meaningless, and standard SQL has no functions for locating rows by position. In a multi-dimensional world, position is vital and MDX is well equipped to manipulate data with regard to its position.

For instance, MDX has a function called CurrentMember which identifies the cell in which you are currently interested. It acts dynamically so as your focus moves, CurrentMember moves too.

The green cell in the cube above could be referred to as:

(Location.Chicago, Products.Lemons, Time.[April])

(MDX syntax insists that member names containing numbers, spaces, special characters and keywords be in square brackets).

Now that the cell has your attention it could also be referred to as:

(Location.CurrentMember, Products.CurrentMember, Time.CurrentMember)

The huge advantage of a function that understands about position in a cube, particularly with respect to time, is that it makes analytical queries very easy to construct.

Take an apparently simple question like "How many pears have we sold since the beginning of the year?" Anyone who has played the SQL will know how painful this is to code because SQL knows nothing about position in a table. (Yes, I know we can use cursors, which are invaluable, but they are not part of standard SQL). An MDX expression to do this analysis comprises merely six words and some bracketry. Wow.

The function YTD stands for 'Year To Date'. It requires a member and returns a set of members that includes the member you gave it and all previous members at that level within the same year. If you give it the member April, it returns a set containing the values for that month and for Jan, Feb and March.

We can use the CurrentMember function to identify the member to pass to the YTD function:

YTD(Time.CurrentMember)

We want to add up the values in the set returned by YTD, so:

Sum(YTD(Time.CurrentMember))

does that, and lastly we identify the measure to use:

Sum(YTD(Time.CurrentMember), Measures.[Units Sold])

That’s the completed expression. (Oh alright, it's seven words but only because the measure has a space in its name). The code above is an MDX expression that creates a calculated measure: it generates new values from existing values in the data set. Calculated measures can be used for many purposes, like working out growth as a percentage or a monetary value or sales averages for comparison over time. They’re almost infinitely flexible.

But the really clever thing about our calculated measure is that it works everywhere in the Time dimension. If you're at the month level, say September 2007, it sums the Units Sold values for January to September 2007 inclusive, and if you’re at Q2-2006, it'll total Q1 and Q2 values in 2006. Using the CurrentMember function lets the calculated measure work dynamically, returning a result wherever you are in multi-dimensional space. Clever or what? ®