On Open Source data warehousing
YI greeted with some scepticism the initial launch of Greenplum earlier this year (here and here) and was unconvinced about the future of an Open Source data warehousing model. Nevertheless, the company has pushed ahead with its plans and has made a number of significant advances.
To begin with, it has simplified its nomenclature, which was previously confusing. Now there is just Bizgres, which is the Open Source platform and, sometime this autumn, there will be Bizgres MPP (massively parallel processing), which will be Greenplum's high performance appliance (that is, with integrated hardware) and is a chargeable version of Bizgres.
The latest version of Bizgres is 0.7, and it won't be until the MPP release that the product will come out of beta. That said, Greenplum has already introduced a number of new features into Postgres (upon which Bizgres is based), most notably through the introduction of an installer suite, data partitioning and bitmap scans (which act in a fashion similar to bitmap vectors in Oracle). At this stage partitioning capability is limited, being based on constraints only, but the company plans to expand these facilities in later releases. It also intends to introduce bitmap indexing. In the Bizgres MPP release the company will be adding an optimiser.
The other notable move forward is in terms of the partnerships that Greenplum has been forming. Bizgres is now bundled with both an ETL (extract, transform and load) tool and a business intelligence front-end. Both of these are also Open Source products; in the first case from Kinetic Research (this is a Java-based product) and in the second, from JasperSoft. The other partnership that has been announced is with O'Reilly Connection, which will provide the equivalent of "Linked-In", but for engineers and developers interested in progressing this Open Source approach to data warehousing. In other words, this provides a way for developers working in this area to identify and communicate with each other.
This is all very encouraging and Greenplum reports a lot of interest in what it is doing, even though, at the time of writing, it has yet to gain any customers. Customers, of course, are the key: with reference sites the company will be much better placed to take its vision into the market. The difficulty is that even having customers does not necessarily lead to references: some companies shun IT-type publicity while, in any case, implementing a data warehouse is not a short-term exercise, so it may be a while before Greenplum can point to real customer benefits.
On the other hand, Greenplum is much more pro-active in its marketing than some of the other new vendors in this space and, obviously, the Open Source message carries a premium. All of this is encouraging but the jury is still out as to the future of Open Source data warehousing.