Building a data warehouse on parallel lines

Kognitio ergo something-for-nothing?

Boost IT visibility and business value

Never look a gift horse in the mouth, especially if there are many of them running in parallel…

There are various structures we can use in a data warehouse – each with its pros and cons. For example, if you use a relational structure for the core of the warehouse then you gain very high flexibility but lose out on speed. Flexibility ensures that you can ask any question of the data and that you can drill down to the leaf level data - but the potentially poor performance is always a pain. You can index the structure but time and disk space usually limit the number of indexes you can apply which in turn reduces the flexibility. As the users’ analytical requirements change, so you need to update the indexing strategy which is often complex and expensive. You can, of course, elect to use a different structure, perhaps a dimensional one, whereupon you gain speed but lose more flexibility.

Speed or flexibility, flexibility or speed? It’s often a difficult call because most of the time we need both. If you find yourself in this situation then some charming guys at Kognitio are amazingly, mind-bogglingly eager to talk to you because they believe that this is precisely what their product WX2 promises. You, on the other hand, are cynically aware that promises are easy and that if there were a simple solution, someone would have thought of it years ago.

In fact, they did. We’ve know for years that parallel processing and in-memory data processing are both mind-boggling fast; the problem has always been one of cost-effective implementation. WX2 is an RDBMS (Relational DataBase Management System) implemented as a MPP (Massively Parallel Processing) system built out of commodity servers, typically blades. These blades form the nodes in what is called a VDA (Virtual Data Appliance). (Well, you didn’t expect to get to grips with a whole new technology without having to learn a whole new abbreviation did you?) Each node consists of one or more CPUs, a block of memory and some disks. The nodes don’t share resources so this is a shared-nothing architecture.

How does it work? Well, imagine a VDA with eight nodes. The data for analysis is distributed evenly (and randomly) across the disks in all eight nodes. Data can be loaded and then queried in parallel but happily, the software handles all of this automatically, so developers working with Kognitio are not required to think in parallel. As soon as the load completes, the data is available for querying; there is no pause while indexes are created for the simple reason that WX2 doesn’t use any. Instead, it manages to perform all of the queries in memory.

If a simple query comes in that only touches data from one node, then that node handles the query. Now imagine that a query comes in that requires (as most are likely to) data from several nodes. The data is read from the appropriate nodes and copied to the memory on one of the nodes, which then processes the data and returns the answer set. As we said above, this isn’t a new idea either; everyone knows that RAM is much, much faster than disk. The problem has always been to find an effective algorithm that can balance the massive storage capacity of disks against the speed of RAM, ensuring that the data is available for ad hoc querying as rapidly as possible. The trick is not the overall idea, it is the implementation. The Devil, as they say, is in the detail.

In addition, the architecture that Kognitio has elected to use has a very desirable side-effect: scalability. The company claims, for example, that “the query performance of a 100-server WX2 system with 10TB of data will be the same as that of a 10-server system with 1TB of data.” To put that another way, if you have a 20 node system which performs well with 100 users, then a 40 node system will perform equally well with 200 users. If you want a third way of looking at this, you can simply add nodes to compensate for more data, more users, or to gain performance. The company claims that its architecture means that there is no measurable overhead as nodes are added, because the “WX’s fully parallel architecture produces true linear scalability.”

There are, of course, already ways of achieving both speed and flexibility. We can, for example, create a relational data warehouse and a set of dimensional data marts. Kognitio argues that this is fine in some cases but that many companies find the solution too baroque. For a start, they need to employ developers for both relational and dimensional databases and in addition, this solution involves multiple copies of the analytical data, which makes auditing a nightmare.

And Kognitio, of course, isn’t the only company that is offering a novel implementation of data warehousing. Check out, for example Netezza and DATallegro [but remember that Kognitio is available as software-only – Ed].

Ultimately, all of these products break the "traditional" way in which data warehouses are built. Kognitio is aware that it can preach as much as it likes, but developers are always (and quite rightly) sceptical. So it has created testing facilities where it “will build you a data warehouse for free and let you analyse your data in days” – which you can find here. So, in this case at least, thinking outside the box doesn’t have to cost you anything but time.®

The Essential Guide to IT Transformation

More from The Register

next story
NO MORE ALL CAPS and other pleasures of Visual Studio 14
Unpicking a packed preview that breaks down ASP.NET
KDE releases ice-cream coloured Plasma 5 just in time for summer
Melty but refreshing - popular rival to Mint's Cinnamon's still a work in progress
Leaked Windows Phone 8.1 Update specs tease details of Nokia's next mobes
New screen sizes, dual SIMs, voice over LTE, and more
Another day, another Firefox: Version 31 is upon us ALREADY
Web devs, Mozilla really wants you to like this one
Put down that Oracle database patch: It could cost $23,000 per CPU
On-by-default INMEMORY tech a boon for developers ... as long as they can afford it
Mozilla keeps its Beard, hopes anti-gay marriage troubles are now over
Plenty on new CEO's todo list – starting with Firefox's slipping grasp
prev story


Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
The Essential Guide to IT Transformation
ServiceNow discusses three IT transformations that can help CIO's automate IT services to transform IT and the enterprise.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.