Feeds

Metadata is not enough

Doesn't provide the full picture

Top 5 reasons to deploy VMware with Tegile

Comment Classically, at least in software terms, reverse engineering is the ability provided by a data modelling tool to inspect an existing database schema and derive entities and relationships from that schema. Hence the use of "reverse" - more usually you use such a tool to build entity-relationship diagrams from which you can generate a schema.

Now, reverse engineering is fine if you simply want to understand the entities and relationships that underpin a particular database for, say, the purposes of extending or modifying the relevant schema. However, it is hopeless if what you want to do is to understand all of the relationships that exist within the database or, even worse, understand relationships that span databases.

So, what do you do if you do want to understand all the relationships that exist across your data, which you might want to in order to support a data governance initiative, the implementation of master data management, or for a variety of data integration purposes?

Traditionally, you start by analysing your metadata and then you reverse engineer it, or you profile it, or you do whatever you like with it, but it won't really work because the metadata available to you is very limited. To put this another way: there are lots of relationships that exist between data elements that are outside of the formal structure of the data mandated by the database schema. For example, CASE statements may create relationships as do filters, concatenations, ETL transformations, business rules and so forth.

In other words, in a relational database the metadata is insufficient to form a full picture of the relationships that exist within the data (at least, without so much manual intervention that it would be cost-prohibitive). One solution to this problem would be to use an associative database instead of a relational one, but that isn't going to happen. So the only other possible approach is to eschew the use of metadata and go directly to the data.

This is what a company called Exeros (which is Greek for "tracker") has done. It has a tool called DataMapper that starts with a database crawler that, rather like an internet spider, crawls through your database or databases and automatically discovers all of your relationships. Well, not actually all: the company reckons about 80 to 90 per cent of your relationships, but as a typical metadata-based approach would be lucky to find more than 10 to 20 per cent this represents a very significant saving in terms of the time and money you need to manually identify the rest.

At present, DataMapper is limited to establishing one-to-one relationships either between or within data sources. In future, the company intends to extend its capabilities to capture multi-way relationships, but currently you would have to link these manually (for which there are capabilities in the product).

As far as I know there is no other product quite like this (though Sypherlink has some overlapping capability). When the present CTO and co-founder of the company originally had the concept behind Exeros he was told it couldn't be done, so it is likely that the company has a considerable lead over potential competitors. Though knowing it can be done is a significant advantage for any followers.

Exeros already has a partnership with Informatica and is in talks with other data integration companies. The company clearly offers a distinct advantage to anyone who uses it, so it is an inevitable takeover target. The only questions will be who, how much and when?

Copyright © 2006, IT-Analysis.com

Remote control for virtualized desktops

More from The Register

next story
PEAK APPLE: iOS 8 is least popular Cupertino mobile OS in all of HUMAN HISTORY
'Nerd release' finally staggers past 50 per cent adoption
Microsoft to bake Skype into IE, without plugins
Redmond thinks the Object Real-Time Communications API for WebRTC is ready to roll
Microsoft promises Windows 10 will mean two-factor auth for all
Sneak peek at security features Redmond's baking into new OS
Mozilla: Spidermonkey ATE Apple's JavaScriptCore, THRASHED Google V8
Moz man claims the win on rivals' own benchmarks
Yes, Virginia, there IS a W3C HTML5 standard – as of now, that is
You asked for it! You begged for it! Then you gave up! And now it's HERE!
FTDI yanks chip-bricking driver from Windows Update, vows to fight on
Next driver to battle fake chips with 'non-invasive' methods
DEATH by PowerPoint: Microsoft warns of 0-day attack hidden in slides
Might put out patch in update, might chuck it out sooner
Ubuntu 14.10 tries pulling a Steve Ballmer on cloudy offerings
Oi, Windows, centOS and openSUSE – behave, we're all friends here
prev story

Whitepapers

Cloud and hybrid-cloud data protection for VMware
Learn how quick and easy it is to configure backups and perform restores for VMware environments.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.
Security and trust: The backbone of doing business over the internet
Explores the current state of website security and the contributions Symantec is making to help organizations protect critical data and build trust with customers.