Feeds

Netezza surprises with technical capabilities

FPGAs, zonemaps, and a strong roadmap

  • alert
  • submit to reddit

HP ProLiant Gen8: Integrated lifecycle automation

Comment I have recently returned from Netezza's second annual conference. This was well attended, with nearly all of the company's customers (around 75) being represented, as well as a significant number of both prospects and partners.

It was very (to use a technical term) buzzy and there was a degree of enthusiasm that I have rarely encountered. However, what was most interesting for me was the number of things I had not previously appreciated about Netezza's technical capabilities. And, of course, its roadmap for the future (though I can't say too much about that).

To begin with there is the question of indexes. Data warehouse appliances in general, and Netezza in particular, tends to be type cast by detractors as only being good for large table scans, because they do not support indexes and therefore cannot run complex joins.

However, in the case of Netezza, at any rate, this is misleading. This is because it uses what might be described as an anti-index, which is called a zonemap.A zonemap allows you to load say, sales by time, and then the zonemap breaks the relevant data down into blocks, storing the details of the first and last record in each block (thus there is a much lower overhead compared to an index).

What this means is that when you run a query you only read the blocks that contain the data you are interested in, ignoring all the other blocks. This ability to limit the data you read means that joins are much more effective than would otherwise be the case. In its roadmap, Netezza described future approaches that will further reduce the amount of data you need to read.

Another interesting thing to come out of the conference was that a number of Netezza customers have stopped using aggregates as a result of implementing Netezza. For example, Carphone Warehouse told me that it was both faster and more accurate to calculate directly from the raw data.

As aggregates are a major issue for database administrators, being able to get rid of them (or, at least, minimise their use) is a significant benefit. Not that Netezza eschews aggregates altogether. More than one user employs a data warehouse appliance (not only from Netezza) as an aggregating engine as a front-end to a third party enterprise data warehouse. I will discuss this further in a subsequent article.

And while talking about enterprise data warehouses (EDW), there are several arguments put against using a data warehouse appliance as an EDW. The first is that you can't use an appliance for complex joins but, as discussed above, this is less and less true, at least as far as Netezza is concerned.

Secondly, there is the issue that the large EDW vendors provide pre-built data models - well, one of the things that Netezza has not made much of is the fact that it has partners that provide exactly these sort of capabilities (typically built on either a star or snowflake schema).

And, thirdly, there is the question of managing mixed workloads. In this last case, Netezza offers guaranteed resource allocation (floors but not ceilings yet), short query bias, materialised views, and prioritisation.

Another area in which Netezza has been hiding its light under a bushel is in the matter of FPGAs (field programmable gate arrays). FPGAs are used to process data as it is streamed off disk. Note that this is important to understand. Most data warehouse appliances (and, indeed, conventional products) use a caching architecture whereby data is read from disk and then held in cache for processing. Netezza, on the other hand, uses an approach that queries the data as it comes off disk before passing the results on to memory. In other words it uses a streaming architecture in which the data is streamed through the queries (whose programs have been loaded into the FPGA) rather than being stored (even if in memory) and then queried.

There are several points to make about this. The first is that you can get much better performance when using this sort of approach than when using a conventional one. For example, it is stream-based processing that is used for algorithmic trading, where processing requirements are of the order of 150,000 transactions per second.

The second is that FPGAs are the natural way of handling streaming environments. For example, they are widely used for voice and video streaming. They are not yet used for event stream processing, but we know of one vendor that plans to do exactly that.

In turn, what this means is that FPGAs are very much a commodity item. Those of us working in more conventional environments may not think of FPGAs like that, but they are as much of a commodity as, say, an Intel processor.

And talking about processors, the other thing that Netezza uses that may seem odd to some people is that it employs a PowerPC chip rather than using said Intel (or AMD). Again, this is similarly a commodity device that is widely used in small footprint devices, primarily because of its low power consumption.

To be specific, a Netezza Snippet Processing Unit (where a snippet is the compiled SQL query that data is streamed through) requires just 30 watts. A complete Netezza rack with 112 of these and 16.5Tb of disks (with 5.5Tb of user data) requires little more than 4Kw and produces 12,000 BTU heat output. Given the power and cooling issues afflicting most data centres today, this is a substantial advantage, as are the reduced floor space requirements.

Returning to FPGAs for a moment, the performance and price of these is following along a similar price/performance curve as those of processors. It is expected that performance and price will both improve by five times by 2010, as will the amount of logic that you can put on an FPGA. This last is particularly important because it will enable Netezza to introduce even more functionality into the FPGA in the future.

Even with the current FPGAs, Netezza plans to introduce features that will increase raw scan-rate performance, tactical query performance, and advanced analytic performance. The advanced analytic capabilities will be made available to partners rather than end users and will allow predictive analytics vendors (like SPSS or SAS) to embed scoring capabilities (say) directly into the FPGA, which should provide significant performance advantages.

Another potential use of the functionality embedded in the FPGA would be to implement column-level encryption, which would be useful for companies in the data aggregation and resale market, for example, because you could use different encryption techniques for each customer's data.

Encryption generally is not available and is not currently on the roadmap and while I would like to see this it is arguably unnecessary - given the structure of a Netezza appliance you would need some seriously good hacking skills to read a Netezza disk, even if you could get at one - so column-level encryption on its own may be good enough.

To conclude, I was surprised by this conference, not just by the enthusiasm of the attendees but also about some of the functionality that Netezza can offer, which I don't think it has done a good job of explaining to the market. It has, for obvious reasons, concentrated on performance, price and reduced cost of ownership but, to take TCO, it has tended to focus on the removal of indexes and tuning but hasn't discussed its advantages when it comes to aggregates.

Similarly, it hasn't really explained why using FPGAs are a good idea, it hasn't made it clear that zonemaps are a form of anti-index, and it hasn't talked much about its advantages in the data centre.

Given all of this, and adding in the rich set of new features in the company's roadmap (a number of which I have not mentioned), there is no reason to expect Netezza to do anything but go from strength to strength.

Copyright © 2006, IT-Analysis.com

Reducing security risks from open source software

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
Amazon Reveals One Weird Trick: A Loss On Almost $20bn In Sales
Investors really hate it: Share price plunge as growth SLOWS in key AWS division
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
The triumph of VVOL: Everyone's jumping into bed with VMware
'Bandwagon'? Yes, we're on it and so what, say big dogs
Carbon tax repeal won't see data centre operators cut prices
Rackspace says electricity isn't a major cost, Equinix promises 'no levy'
prev story

Whitepapers

Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Seven Steps to Software Security
Seven practical steps you can begin to take today to secure your applications and prevent the damages a successful cyber-attack can cause.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.