Feeds

Building a data warehouse on parallel lines

Kognitio ergo something-for-nothing?

Beginner's guide to SSL certificates

Never look a gift horse in the mouth, especially if there are many of them running in parallel…

There are various structures we can use in a data warehouse – each with its pros and cons. For example, if you use a relational structure for the core of the warehouse then you gain very high flexibility but lose out on speed. Flexibility ensures that you can ask any question of the data and that you can drill down to the leaf level data - but the potentially poor performance is always a pain. You can index the structure but time and disk space usually limit the number of indexes you can apply which in turn reduces the flexibility. As the users’ analytical requirements change, so you need to update the indexing strategy which is often complex and expensive. You can, of course, elect to use a different structure, perhaps a dimensional one, whereupon you gain speed but lose more flexibility.

Speed or flexibility, flexibility or speed? It’s often a difficult call because most of the time we need both. If you find yourself in this situation then some charming guys at Kognitio are amazingly, mind-bogglingly eager to talk to you because they believe that this is precisely what their product WX2 promises. You, on the other hand, are cynically aware that promises are easy and that if there were a simple solution, someone would have thought of it years ago.

In fact, they did. We’ve know for years that parallel processing and in-memory data processing are both mind-boggling fast; the problem has always been one of cost-effective implementation. WX2 is an RDBMS (Relational DataBase Management System) implemented as a MPP (Massively Parallel Processing) system built out of commodity servers, typically blades. These blades form the nodes in what is called a VDA (Virtual Data Appliance). (Well, you didn’t expect to get to grips with a whole new technology without having to learn a whole new abbreviation did you?) Each node consists of one or more CPUs, a block of memory and some disks. The nodes don’t share resources so this is a shared-nothing architecture.

How does it work? Well, imagine a VDA with eight nodes. The data for analysis is distributed evenly (and randomly) across the disks in all eight nodes. Data can be loaded and then queried in parallel but happily, the software handles all of this automatically, so developers working with Kognitio are not required to think in parallel. As soon as the load completes, the data is available for querying; there is no pause while indexes are created for the simple reason that WX2 doesn’t use any. Instead, it manages to perform all of the queries in memory.

If a simple query comes in that only touches data from one node, then that node handles the query. Now imagine that a query comes in that requires (as most are likely to) data from several nodes. The data is read from the appropriate nodes and copied to the memory on one of the nodes, which then processes the data and returns the answer set. As we said above, this isn’t a new idea either; everyone knows that RAM is much, much faster than disk. The problem has always been to find an effective algorithm that can balance the massive storage capacity of disks against the speed of RAM, ensuring that the data is available for ad hoc querying as rapidly as possible. The trick is not the overall idea, it is the implementation. The Devil, as they say, is in the detail.

In addition, the architecture that Kognitio has elected to use has a very desirable side-effect: scalability. The company claims, for example, that “the query performance of a 100-server WX2 system with 10TB of data will be the same as that of a 10-server system with 1TB of data.” To put that another way, if you have a 20 node system which performs well with 100 users, then a 40 node system will perform equally well with 200 users. If you want a third way of looking at this, you can simply add nodes to compensate for more data, more users, or to gain performance. The company claims that its architecture means that there is no measurable overhead as nodes are added, because the “WX’s fully parallel architecture produces true linear scalability.”

There are, of course, already ways of achieving both speed and flexibility. We can, for example, create a relational data warehouse and a set of dimensional data marts. Kognitio argues that this is fine in some cases but that many companies find the solution too baroque. For a start, they need to employ developers for both relational and dimensional databases and in addition, this solution involves multiple copies of the analytical data, which makes auditing a nightmare.

And Kognitio, of course, isn’t the only company that is offering a novel implementation of data warehousing. Check out, for example Netezza and DATallegro [but remember that Kognitio is available as software-only – Ed].

Ultimately, all of these products break the "traditional" way in which data warehouses are built. Kognitio is aware that it can preach as much as it likes, but developers are always (and quite rightly) sceptical. So it has created testing facilities where it “will build you a data warehouse for free and let you analyse your data in days” – which you can find here. So, in this case at least, thinking outside the box doesn’t have to cost you anything but time.®

Beginner's guide to SSL certificates

More from The Register

next story
That dreaded syncing feeling: Will Microsoft EVER fix OneDrive?
Microsoft's long history of broken Windows sync
Mozilla, EFF, Cisco back free-as-in-FREE-BEER SSL cert authority
Let’s Encrypt to give HTTPS-everywhere a boost in 2015
SLURP! Flick your TONGUE around our LOLLIPOP – Google
Android 5 is coming – IF you're lucky enough to have the right gadget
Nokia's N1 fondleslab's HIDDEN BRILLIANCE: The 'Z Launcher'
Sugarcoating Android's Lollipop makes tab easier to swallow
Bug fixes! Get your APPLE BUG FIXES! iOS and OS X updates right here!
Yosemite fixes Wi-Fi hiccup, older iOS devices get performance boost
Facebook, working on Facebook at Work, works on Facebook. At Work
You don't want your cat or drunk pics at the office
Soz, web devs: Google snatches its Wallet off the table
Killing off web service in 3 months... but app-happy bonkers are fine
Meet Windows 10's new UI for OneDrive – also known as File Explorer
New preview build continues Redmond's retreat to the desktop
Microsoft: Your Linux Docker containers are now OURS to command
New tool lets admins wrangle Linux apps from Windows
prev story

Whitepapers

Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Simplify SSL certificate management across the enterprise
Simple steps to take control of SSL across the enterprise, and recommendations for a management platform for full visibility and single-point of control for these Certificates.
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.