Another approach to federated query
On Callixa and data agents
The vast majority of vendors supplying federated or EII (enterprise information integration) platforms employ the same paradigm. This can basically be described as a view or virtual schema builder at the front end, with an optimiser and a cache in the platform. Now, there is not much disagreement about the front-end but there are alternative approaches in terms of how you optimise the environment.
One company that differentiates itself from the rest of the herd is Callixa. Callixa has a tragic history but it is worth recounting. The company was founded at the end of the last century and claims to have invented the term EII. Certainly, it introduced its first product in 2000. However, this product didn't perform well enough for the financial services market that Callixa was aiming at. The company therefore went away, reconsidered the architecture of the product and re-designed it.
Unfortunately, the company chose 9/11 as the day to unleash its new product on its potential clients. Worse, it chose the World Trade Center to do so. Callixa lost so many of its people and resources that the investors decided it would be best just to close the company down; but, about a year ago, a group of the original executives got together and re-founded the company.
The Callixa approach differs significantly from other solutions in a number of respects. To begin with, it is based on a grid architecture that uses a shared-nothing distributed approach and deploys performance features such as multi-threading, pipeline parallelism and partitioning. Actually, that is not especially different: perhaps in degree but not in kind. Where it is different is that it deploys what the company calls "data agents".
Callixa's contention, and it has considerable merit, is that the big issue with federated queries is in determining what work you push down to the source databases and what you do in the platform and, in particular, where you execute joins. This is where data agents come in. These may be located anywhere on the network and you can also have multiple agents on any one system. For example, you would typically have a different agent assigned to each different partition. Working in conjunction with the distributed optimiser, Callixa argues (and it is probably right) that this architecture will result in superior performance.
However, Callixa's isn't the only alternative to conventional data federation. A more radical approach can be seen in Sunopsis's Data Hub. In this case, the company's argument is more straightforward: it suggests that the whole process will be much faster and easier if you simply replicate all the relevant data to a separate data store, and then maintain that in a synchronised state. Then you can simply query against the Data Hub and you don't need to worry about distributed optimisation, caching, and data agents. Interestingly, Sybase's recently released Dynamic ODS isn't a million miles from this sort of solution and it is easy to imagine this product being developed into something comparable to that of Sunopsis.
Note that the development of these alternative/additional approaches is not in the least surprising. It is typical of expanding markets – sometimes they prove to be more successful, sometimes they don't. Time will tell.
Sponsored: RAID: End of an era?