The myths and legends of the holy land of the database
The first myth explored and exploded
The database world has more myths and legends than the court of King Arthur. The current myths tend to be less about dragons and dungeons and more about features and performance, such as:
- Oracle can't do MOLAP.
- Oracle is as easy to tune as a cathedral organ.
- DB2 only runs on mainframes.
- SQL Server doesn't scale.
- Oracle is eye-wateringly expensive.
- Access isn't a relational database.
- SQL Server cannot row lock.
Although, sometimes there is more than a hint of scandal:
- Microsoft was caught stealing secrets from Borland.
- Oracle was caught cheating at the TPC benchmarks.
- Philippe Kahn once bought Hawaiian shirts for 3,000 database conference delegates just to annoy Microsoft.
These myths tend to surface whenever database people meet - either virtually, in the news groups, or socially, in their cups.
Where do these myths originate?
Well, this is a very competitive industry and the truth is that some of them are simply made up by competitors. I know, I happen to have been there at the birth of two of those listed above. Of course, I would never suggest that this fabrication can ever be laid at the door of the CEOs involved - I'm sure that the very smooth Larry Ellison has never been personally responsible for any of these urbane myths...
Once started, these myths are perpetuated because database people are notoriously proud of, and defensive about, their database engine. They are delighted to repeat stories that denigrate the opposition, and so the myths grow.
So, do these myths have any basis is reality or are they all just dragon food? Well, one definition of a myth is "a popular belief that is false or unsupported by facts", another is "a truth disguised and distorted", and those fit pretty well with our myths. Some are provably false, other have some basis in fact. Oddly, some, like the SQL Server row locking one, were at the same time both completely true and totally misleading.
In this series of articles we are going to look at these widely-held beliefs (all of those listed above and more) and try to separate the truth from the misinformation. Unlike Sir Robert Armstrong we will try to be uneconomical with the truth.
Let's start with an Oracle myth.
"Oracle can't do MOLAP."
In the BI world, this is a serious charge. MOLAP (Multi-dimensional On-Line Analytical Processing) is the Wunderkind of the BI world. So, how does this myth stack up?
OLAP (On-Line Analytical Processing) is a way of organising data so it can be analysed very rapidly by users. It takes a base set of relational data (say, sales of individual products to particular customers, day by day) and calculates a set of aggregated values (eg sales of products to all customers in Essex for each month). The combination of the base and aggregated data is known as an OLAP cube.
The point of building an OLAP cube is that when users ask to see, say, the sales of fridges, month by month in a particular county, the answer comes back very rapidly. OLAP can be implemented in three ways:
ROLAP is Relational OLAP and means you are storing the base data and the aggregations in relational tables and managing it using a relational database engine.
MOLAP stores both the base and aggregated data as a multi-dimensional array of data. This data must be managed by a specialised multi-dimensional database engine.
HOLAP is a hybrid between the two. The base data is held in relational tables and the aggregations in a multi-dimensional array.
Why three different ways to implement? Well, ROLAP means you can use the engine that you know and love. It is safe, secure and familiar. In addition, when OLAP started, there were no multi-dimensional database engines, so ROLAP was all we had.
MOLAP is a more modern approach. It usually generates aggregations faster and the cube it generates is generally faster to query. As a result, MOLAP is now very popular.
HOLAP can be useful where you have a massive set of base data and you find the users of the systems very often query just the aggregated data.
Now, it turns out that when Oracle is used to create an OLAP set of data it stores both the base and aggregated data in relational tables. It does not use a separate multi-dimensional array. Since we know that relational storage equals ROLAP, and we know that Oracle only does it this way, it follows that Oracle can't do MOLAP. Oracle is old fashioned, Oracle is dumb, Oracle is...
This is actually a very clever myth, one of the class where the facts as presented are all true but the inference is completely wrong. What the myth perpetrators leave out is an additional fact; namely that the Oracle database engine has, integrated within it, the code necessary to manipulate multi-dimensional arrays. That one fact enables Oracle to do something rather unusual.
So, what really is going on?
Well, the base data is, as always, a set of data stored in text and numeric fields in relational tables. The "normal" ROLAP process creates a set of aggregated values and stores them as numeric and text fields in a relational table. When Oracle creates an OLAP cube it generates the aggregations as a set of multi-dimensional arrays and stores these, not as a structure to disk, but in BLOB (Binary Large Object) fields within a relational table. The fact that Oracle creates multi-dimensional arrays as the aggregation store tells us that this is definitely MOLAP - the only oddity is where those arrays are stored.
Why does Oracle store the data in this way? Glad you asked [I rather assumed it was because Oracle wants to push Oracle licenses - Ed]. Oracle argues that in terms of performance it doesn't particularly matter where the multi-dimensional array is stored. The huge advantage of storing it in a relational table is that doing so brings the data inside the world that Oracle DBAs already understand. It is within the compass of the normal backup systems, the normal security. An Oracle DBA can maintain an OLAP database as easily as a normal one because the OLAP one IS a normal one - at least in terms of day-to-day maintenance. It is a multi-dimensional one in every other respect, including performance.
(In fact, we can now start a discussion about whether it is MOLAP or HOLAP but that discussion turns into a matter of degree - how much of the data is relational, how much multi-dimensional.) The point is that Oracle can handle true multi-dimensional data structures with perfect aplomb, the only unusual feature of Oracle's implementation is where it stores the multi-dimensional set.
So, that's one myth explored and, in this case, exploded. Oracle was the victim here, but don't feel too sorry for the company; it's big enough and robust enough to look after itself in the rough and tumble of the commercial world. Take, for example, its recent advertising campaign, which boasts the headline "Oracle Database 10g #1 on Windows". As Oracle clearly intended us to understand, the '#1' tag here refers to the version of Oracle shown (Standard Edition One). It is certainly not intended to imply that Oracle is the best selling database engine on the Windows platform.
So, happily, Larry Ellison doesn't have to say to his marketing people "That's another fine myth you've gotten me into".