Wanna know how to rate a data warehouse appliance?
A handy list of rules
One of the major discussion points at Bloor Research's recent conference on "Data Warehousing: the rise of the appliance" was a discussion of the rules (though they might equally be regarded as reference points rather than rules) that might apply to data warehouse appliances as opposed to enterprise data warehouses.
I presented an initial set of rules based on my own research and these have been subsequently modified in light of the comments made by people at the conference. These are detailed below and, as will be seen, are broken down into three categories: generic rules for appliances, specific rules for data warehouse appliances, and specific rules for enterprise data warehouses. These rules exclude general considerations that are applicable to all sorts of offerings such as security, integration with third party vendors, support for open standards, encryption, load and unload speeds, and so on. The list of rules, with comments pertinent to data warehouse appliances, is as follows:
Rule 0: An appliance may be a multi-purpose device but exists within a limited context—currently, data warehouse appliances may be good for fast table scans, complex analytics and as aggregation engines, for example - thus they certainly do more than one thing; but they are only just moving into the enterprise data warehouse space. Context is a matter of perspective.
Rule 1: You plug it in and it goes—how quickly it goes may be variable. For example, Netezza set up a system at our conference in just 15 minutes. On the other hand, IBM reckons that it takes around six hours to set up a BCU. Again, the extent to which you consider either of these figures as "plug it and it goes" is a matter of viewpoint. Note that you may still require a special power supply to make it go.
Rule 2: It is simple to implement, administer and maintain - this is a no-brainer; administration here refers to the appliance rather than the database.
Rule 3: No (minimal) tuning is required - ditto. It is worth noting that no tuning is difficult to achieve unless the context of your product is very limited. For example, you can have no indexes and no aggregates but such things as prioritisation and scheduling arguably involve tuning. Note that the sort of autonomics provided by the likes of IBM and Oracle at least makes index tuning simple.
Rule 4: It is data centre friendly - less footprint, lower power requirements and reduced cooling needs are all increasingly important and appliances tend in this direction. We all want more for less.
Data Warehouse Appliance (DWA) rules
Rule 5: In a DWA the hardware and software have been designed to optimise each other - this is the ideal position if you want to get maximum performance - some vendors only optimise the hardware or software and not both. The down side (which I do not consider particularly significant) of optimising both is that you don't have a choice of hardware platform: do you care?
Rule 6: A DWA attempts to minimise all potential system bottlenecks - in theory, any system can have an I/O, CPU, memory or interconnect bottleneck though I/O is by far the most common. Different vendors in the market use different approaches to overcome their point(s) of weakness, which may impact on their performance in different environments. This has important implications for both live running and proofs of concept, which I discuss further in my (forthcoming) article "Data warehouse appliances: designing a proof of concept".
Rule 7: A DWA appliance is easily upgradeablem - systems should be easily upgradeable at the component, disk and software levels - the need to replace systems should be absolutely minimised. Note that this is less of an issue now than it used to be.
Rule 8: A DWA provides high availability - there should be no single point of failure: mirrored disks, dual interconnects, failover and so forth should all be implemented. Note that if you are building your own solution based on a software-only appliance then you should not attempt to cut any corners here.
Enterprise Data Warehouse (EDW) rules (providing functionality beyond a DWA):
Rule 9: An EDW supports a mixed query workload - more and more users are accessing data warehouses with a wider and wider range of queries (and query types). DWA suppliers are starting to develop capabilities in this area but most such solutions are limited in their capability today, though some more than others. Netezza, for example, has a number of facilities in this area, such as short query bias (which DATAllegro also offers), scheduling, prioritisation, guaranteed resource allocation and so on.
Rule 10: An EDW is scalable both in capacity and for users, with maximised concurrency. Scalability for users is a significant issue for appliance vendors at present—typically, user scalability for a DWA is measured in hundreds at best, rather than thousands. On the capacity side there is not such an issue: Netezza has offerings up to 100Tb while DATAllegro can grow significantly larger than this.
Rule 11: An EDW supports real-time data loading and operational and actionable (process aware) BI—this is not something that appliance vendors are much involved with right now, though this is, at least in part, about partnerships.
Rule 12: An EDW handles unstructured (text and XML) data as well as structured data—none of the appliance vendors can do this yet. Both this rule and rule 11 will become increasingly important over the next five years, in my opinion.
Looking at the various offerings in the market in terms of this reference model is quite interesting. It is quite clear that the appliance vendors are working down this stack while traditional suppliers already have capabilities nine to 12 (some more than others perhaps) but are working at introducing the earlier rules. Sybase, for example, which presented at our conference, discussed Sybase IQ as being appliance-like (because of its performance, low storage requirements and so on) while IBM is going down the same path (in a different way) with the introduction of the BCU.
Having said that, it is important to appreciate that an EDW is whatever is in the eye of the beholder. I certainly know of users that claim to have a Netezza EDW: they don't require the functionality of rules 11 or 12 and Netezza is scalable enough, and has sufficient mixed query capability for the company's needs. This is potentially true of other appliance vendors also.
In principle, one could rate all vendors against these rules, add in the generic considerations not discussed, apply relevant weighting factors and come up with a league table of results. Maybe I'll get around to doing that in due course.
Copyright © 2006, IT-Analysis.com
Sponsored: DevOps and continuous delivery