In praise of FPGAs

Building an active data warehouse appliance

channel

All data warehouse appliances have a massively parallel architecture in which there are multiple nodes that put processing as close as possible to the disk drives.

In Netezza's Performance Server these nodes are known as Snippet Processing Units (SPUs: pronounced, incidentally, to rhyme with Gnu rather than being a homonym for throwing up - though Teradata might think otherwise) where a snippet is executable machine code (compiled SQL).

In hardware terms an SPU consists of an FPGA (field programmable gate array) and a PowerPC processor. Now, when vendors talk about using commodity hardware neither of these are the sort of things that spring to mind.

Nevertheless, FPGAs are widely used for graphics processing, video and audio streaming and other applications, while PowerPC chips are commonly employed in small profile devices. In particular, PowerPC chips have very low power requirements compared to PC chips: for example, a Netezza rack holding 5.5Tb of user data (16.5Tb actual capacity) has a power rating of just 4,180 watts and, of course, there is commensurately low heat output. Given the current pressures on data centres this is a significant advantage that Netezza has not made as much of as it might.

The way the FPGA works in Netezza's architecture is that it effectively works like an event processing engine: data is streamed from disk and then passes through the query that is represented by the snippet (and is loaded into the FPGA). I had not previously appreciated this until a recent visit to Netezza's user conference but you can imagine (if you are a regular reader) my interest.

While I was in Boston for Netezza's conference, I also had a meeting with Kaskad Technologies, which is a new entrant into the event processing space, helping to build the new fully electronic Boston Stock Exchange, which goes live on 27 October, with its surveillance application.

Now, Kaskad has done some clever things with its event processing product but it is more or less comparable to the conventional vendors (Progress, StreamBase, Coral8 et al) in architectural terms. However, what is much more interesting is that the company is already working on its second generation of event processing architecture. This will involve the use of multiple finite state machines running in parallel, hanging off a single bus.

In other words, instead of having a single large server you have multiple small servers running in parallel. And how is Kaskad planning to implement these small servers? Using FPGAs. And bearing in mind that FPGAs are the technology of choice for streaming applications this obviously makes sense.

Now, readers may recall that I recently suggested that a better way to do what Teradata calls active data warehousing (whereby you use change data capture to trickle feed the warehouse and then run actionable queries and processes) would be to front-end an event processing engine on to a data warehouse appliance. In fact, Sybase (albeit that Sybase IQ is not an appliance) is doing exactly this with its Risk Analytics Platform, for which Sybase is talking to a number of event processing vendors about front-ending their products.

However, the use of FPGAs opens up another possibility. Why not have different types of FPGA within the same architecture? You could have some FPGAs processing data that was streamed from disk and other FPGAs that processed data that was streamed from external environments off a bus.

In other words, you could build an active data warehouse appliance. Or, looking at it another way: an integrated event processing/warehouse environment. Given that you often want to access historic information for event processing purposes this could be an architecture that makes sense.

Copyright © 2006, IT-Analysis.com

Sponsored: 5 critical considerations for enterprise cloud backup