The Register® — Biting the hand that feeds IT

Feeds

OLAP and the need for SPEED

In another dimension

Free ESG report : Seamless data management with Avere FXT

Database myths and legends (Part 7) In this series we're looking at the myths and legends of the database world; some turn out to be true, others false. This myth is about why we use OLAP.

If you follow the Inmon model, you use a relational data warehouse for flexibility and OLAP cubes in the data marts for the speed. On the other hand, if you follow Kimball, you simply use OLAP in the core data warehouse. Either way, OLAP is where you get the incredible query response time that we need for a good Business Intelligence system. So OLAP is all about speed.

OK, let's get back to basics for a moment. OLAP stands for Online Analytical Processing, was originally very well defined, and is a surprisingly new term. It first appeared a mere 14 years ago, in a paper entitled Providing OLAP to User-Analysts: An IT Mandate by E F Codd, S B Codd and C T Salley, ComputerWorld, July 26 1993.

And yes, E F Codd is the Ted Codd, the Father of the relational database. After the paper was published it gained some notoriety because Codd had undertaken consulting work for Arbour Software (now Hyperion). This was unfortunate because the paper actively discussed one of Arbour's products, Essbase. In the end, Computerworld took the unusual step of retracting the article; nevertheless this paper clearly marks the start of the term's use. A copy is available on line from Hyperion here. The paper defines 12 rules for evaluating OLAP products which are:

  1. Multi-dimensional conceptual view
  2. Transparency
  3. Accessibility
  4. Consistent reporting performance
  5. Client-server architecture
  6. Generic dimensionality
  7. Dynamic sparse matrix handling
  8. Multi-user support
  9. Unrestricted cross-dimensional operations
  10. Intuitive data manipulation
  11. Flexible reporting
  12. Unlimited dimensions and aggregation levels

While Codd never directly says OLAP systems should be fast, he is clearly very interested in their performance (see rule 4). In addition, almost all OLAP systems do provide a phenomenal increase in performance over relational systems. So we can argue from this that the myth is true: OLAP is about performance.

But it is clear from reading the paper that Codd also sees the multi-dimensional component of OLAP as essential. Early on in the paper he says: "This...multi-dimensional conceptual view appears to be the way most business persons naturally view their enterprise." And, as you can see, four of the 12 rules directly refer to dimensions, so OLAP is also about the way users think about, and are allowed to visualise, their data.

We know that speed is important to OLAP, but exactly how important is this multi-dimensional aspect?

One easy test of the importance of a property to the definition of an object is to imagine the object minus that property. Does it remain essentially the same object without the property or does the loss turn it into something else? Is a robin without a red breast still a robin? Are Christians who loses their faith still Christians? Is OLAP without multi-dimensionality still OLAP?

Well, imagine a relational data warehouse that is magically very, very fast. Users can perform any query they like against it and expect a response time of one second. Would this still be OLAP? We can be certain that the answer here is "no" for the simple reason that there is no need for a new term like OLAP to describe this; what we have here is simply a very fast relational database. Apart from the speed, it will suffer all the joys and pains of normal relational databases. It will be very flexible (it puts no constraints on the queries that can be posed) but the users will still find it very difficult to query because, in order to formulate the query, they have to understand the data structure. Experience suggests that business users find this very difficult.

So, OLAP without the multi-dimensional structure isn't OLAP. This is true in the real world of 2007 and it was also true in Codd's original definition of OLAP. In the paper he says: "OLAP is the name given to the dynamic enterprise analysis required to create, manipulate, animate, and synthesise information from exegetical, contemplative, and formulaic data analysis models." In other words, OLAP is more about the data model than the speed.

The problem with the myth is that by focusing on speed it loses sight of what we are trying to achieve in Business Intelligence (of which OLAP is a subset). We are trying to find information in a mass of data. Speed alone (while eminently desirable) does not provide this; we also need to layer a framework over the data (Codd's multi-dimensional conceptual view) to provide an interpretation that users can understand.

So, the myth is busted. OLAP certainly is about speed but it isn't all about speed. There is much more to analysis than rows per second.

Incidentally, this focus within OLAP on the way in which we think about and view the data is highly relevant to some of the recent discussions on The Register about novel approaches to business intelligence.

Take, for example, Kognitio's WX2. Kognitio has developed technology that allows very rapid access to relational data (just like the example data warehouse discussed above). The technology is fascinating and provides us with another tool that we can add to our armoury of techniques. It is a great solution for certain classes of problem. But, since it doesn't provide the multi-dimensional conceptual view it can never be considered as a substitute for OLAP.

And, as a final point, albino robins do exist in nature and are still considered to be robins. As for the non-ecumenical question; that is probably better left to Father Ted. ®

5 ways to reduce advertising network latency

Whitepapers

Microsoft’s Cloud OS
System Center Virtual Machine manager and how this product allows the level of virtualization abstraction to move from individual physical computers and clusters to unifying the whole Data Centre as an abstraction layer.
5 ways to prepare your advertising infrastructure for disaster
Being prepared allows your brand to greatly improve your advertising infrastructure performance and reliability that, in the end, will boost confidence in your brand.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Email delivery: Hate phishing emails? You'll love DMARC
DMARC has been created as a standard to help properly authenticate your sends and monitor and report phishers that are trying to send from your name..
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?

More from The Register

next story
Windows 8 fans out-enthuse Apple fanbois
Redmond allows 81 Win 8 devices to use one user ID, solving side-loading shemozzle
'200 million' fanbois using iOS 7 just a week after release - study
Plus: Most US iDevice users are drinking Cupertino's latest Koolaid
No luck at all for BlackBerry as Messenger apps launch stalls
Leaked Android build 'causes issues,' is withdrawn
App Store ratings mess: What do we like? Sigh, we dunno – fanbois
How do I know what to download if I don't know what everyone else is doing?
OUCH: Google preps ad goo injection for Android mobile Gmail app
Don't worry, fandroids, wallet-plumping serum won't hurt a bit
Launchpads, catapults... what a load of - WAIT, there's £15m for grabs?
Quango sprinkles cash on games, animation and trendy meeja types
Apple iOS 7 makes some users literally SICK. As in puking, not upset
'Eye candy really is as bad as classical candy is for the teeth,' writes one
Google reveals its Hummingbird: Fly, my little algorithm - FLY!
Update brings Googleplex one step closer to sentience
prev story