DIY and nearly BI
A useful sub-set of BI for those who don't need the Full Monty
Column My esteemed colleague on Reg Developer, Martin Banks, has argued  that do-it-yourself BI (Business Intelligence) is a trend worth watching:
As he said…
“The premise being put forward by companies looking to move into DIYBI is that BI so far is only being performed by the largest enterprises, and then only by white-coated rocket scientists operating as something close to a secret society. Now, they say, it is time for BI to be open to `the masses’ as a self-service offering that no longer requires a trip to the BI Masonic lodge.”
El Reg was immediately contacted by several companies saying “But that’s what OUR products do!” So, ever curious, here we are, having a look at one such company, Ardentia, and its product NetSearch.
The job of a ‘standard’ BI system is to allow people to find useful information in the mass of data that is stored in the company’s database systems (finance, HR, Sales etc.).
NetSearch argues that not only is data to be found in the database systems, but large quantities are also locked up in files on file servers, in content management systems, on web servers – in other words, in a huge range of different, less formally structured formats. NetSearch indexes and searches both structured and non-structured data in exactly the same way and can present the user with a single interface that allows it all to be searched. As the companies web site so un-grammatically tell us: “Utilising NetSearch ensure the cost of delivering information to your end users is reduced. While providing the ability to react faster to changing requirements and circumstances.”
In addition the company argues that people already know how to search the vastness of the internet using search engines, so it makes sense to apply the same searching concept to the data that a company stores. That makes it suitable for both small businesses as well as larger enterprises.
Licence charges vary from a five-seat cost of £5,000, to 500-seats at £60,000. To find out if it is what’s needed, you can download a trial copy here , so we did. It installs easily and the documentation tells you how to set it up as an administrator.
NetSearch provides the administrator with a number of Wizards which can be used, for example, to set up the file directories, databases, web sites and mail systems that should be included in the indexing process.
The term `Wizard’ often implies an easy-to-use, friendly kind of process. As Wizards go, however, these lean rather more to the Saruman school of wizardry rather than the Gandalf. For example, you are clearly expected to know the magic incantations that friendlier wizards supply. Take a look at the Simple Database Wizard.
Under URL it helpfully suggests the string:
Of course this won’t work unmodified and your mission, should you chose to accept it, is to modify it to the correct string. If you don’t know how, you have a mission impossible.
The documentation tells you that: “Any text that appears in angle brackets must be edited to ensure NetSearch can index the data source correctly. You may, however also need to edit other details in this fields. The format of this string depends on the JDBC driver that you have specified in the Driver field. For more information about the connection string contact your database provider.”
And, further down on the same screen, you’ll notice that not only is a knowledge of SQL essential, but you also have to know the specific database schema. None of this is impossible, but Wizards usually shield their users from this level of detail by reading the metadata from the database.
Once the data sources have been defined and the indexes created, users can log in and start searching. One great feature of NetSearch is its ability to catalogue files (such as Word documents, Excel files, etc) simply as raw files.
At this point you may be thinking, “Yes, but what about Google Desktop Enterprise Edition? That looks good and works well.” True. But it is essentially designed to work with unstructured data. One of the strengths of NetSearch is its ability to combine the search intelligently within structured sets of data (such as Access .MDB files) that it finds within the generally unstructured data in a file system.
To illustrate this, the output you see here is the result of pointing NetSearch at a random file structure and searching for the words “penguin”, “foo” and “baa”.
It has found a collection of Access database files that (for reasons which will probably never become apparent) do actually contain those words and is clearly capable of intelligently reading the file structure, correctly interpreting it and indexing the contents, as we can see if we drill into one of these results using the Info link…
So, how does this product live up to the ideal of DIYBI that Martin described?
It is important to remember what NetSearch does: it allows us to find data, no matter where it exists in the company. Traditional BI systems do more, they turn data into information. That is an incredibly valuable transition, but a crucial one and often non-trivial to accomplish. Very frequently it involves the aggregation of the base level data so that business users can start with an overview of what is happening in the company before drilling down into the detail. Almost inevitably it also involves data cleansing and the agreement of data definitions and standards across the company. I’m not, for a minute, suggesting that Ardentia doesn’t know this; indeed their website talks about these issues.
The fact that NetSearch doesn’t help with data cleansing, aggregation or data definitions doesn’t invalidate the product’s aspirations to the BI world because BI is a broad church and NetSearch has a Unique Selling Proposition – the ability to allow users to correlate data across formal, structured systems (such as source databases) and unstructured data such as that found in file systems. Think of NetSearch more in terms of a tool in your BI toolbox rather than a complete solution. ®