IBM gets handle on unstructured data
Almaden incarnates search and BI
Agentless Backup is Not a Myth
It is perhaps easy to assume that the notion of BI (business intelligence) for the masses - or 'DIYBI', as espoused here, is most likely to involve a sawn-off version of an existing BI tool - probably a mature one where the development costs have already been recovered.
In practice, this is somewhat less than likely, if only because most corporate BI tools are focused on working with structured data which is more found more extensively in large enterprises. It is already well understood that the majority of data in play in any business is unstructured and therefore not immediately well-suited to BI manipulations - and this is even more likely in the smaller businesses for which DIYBI might be attractive.
Developing technologies that can not only work with unstructured data but actually extract information from it of real value for the user is an important step along the way to DIYBI for the masses, but it also involves technology that goes well beyond what might be called a typical BI tool of today.
At one level, the basics of the 'DIY' toolset already exist in the form of the search engines such as Google, Yahoo! and MSN among many others. But search is a very minor part of BI, and in any environment where there is an embarrassment of unstructured data riches, can by itself be more of a hindrance than help.
The key here, according to Nelson Mattos, vice president of information & interaction research at IBM's Almaden Research Laboratories in California, is the ability to provide semantic analysis on both structured and unstructured data within a common environment.
"Users want to find whatever it is they want, regardless of where it is stored," he said. "They also want to continue to working with the tools they have and know, such as spreadsheets and Powerpoint presentations."
To that end, user interfaces are of equal importance to them, which means that search engine technologies have now become the UI of choice, says Mattos. "Everyone is already familiar with the search engine model - everyone can type a few words and get a result - and I want to use that paradigm in the context of a business intelligence environment."
IBM's research work is not intended to service the DIYBI market but it fits in with the notion of BI for the masses, for it is designed to support users running tasks associated with their jobs rather than be a tool for BI specialists. This is the target for Project Avatar, currently under development at Almaden.
Its object is to provide the tools that allow to users extract insight out of both structured and unstructured data. "This is a very broad area that requires management of structured and unstructured data and the use of traditional search technologies," Mattos said, "though that is not sufficient, because if you look at data across the enterprise there are huge amounts, so unless there is some intelligence to help find the insights we just overwhelm users with the amount of information."
Mattos sees search engines and BI coming together as the world of unstructured data is reeled in to the business need for intelligence. “At the moment 80-85 per cent of the data stored on computers is unstructured,” he said, “so it would be good to have a common framework that will allow users to analyse a record or historical data to identify problems, issues or trends – analysing structured and unstructured data together.”
The paradigms used to interface with that combined framework may look like a search engine but they will be different. The text related search technologies have been, until recently, solely built on keywords not semantics. There have been some sophisticated algorithms developed that can look at keywords in a context, recognising company names, technical components and the like. But they were developed in a proprietary fashion and that, Mattos suggests, is why they have never taken off.
Regcast training : Hyper-V 3.0, VM high availability and disaster recovery
COMMENTS
Nice thoughts, but already implemented in InfoCodex.
Thanks for the interesting article. Once again IBM is giving us a great vision about the future and how unstructured information can be searched.
InfoCodex already does all this today with the help of a linguistical database and synonym and/or similarity search across 5 languages (German, French, Italian, English and Spanish). With InfoCodex you can search for a block of text in one language and it will find you all the similar documents in the other languages as well. All of this is done without one single minute of training - because of the linguistical database that contains 2.9 Mio words and terms (i.e. "European Court of Justice" or "The President of the United States" are terms and reconized as such).
See the following links:
http://www.ywesee.com/pmwiki.php/Ywesee/InfoCodexProcedure
http://www.ywesee.com/uploads/Ywesee/archimag-e.pdf
http://www.ywesee.com/uploads/Ywesee/Evaluationsentscheid-e.pdf
http://www.ywesee.com/uploads/Main/USP_e.pdf
Not really. But it is a fluff piece.
I think the main acronym is DIYBI or "Do it Yourself" BI (where BI Is the industry term "Business Intelligence".)
Nelson is actually one of the better guys in the lab, but I do agree that this article was a boring fluff piece.
I think the point is that IBM is extending their BI vision and that since more "unstructured" enterprise data is being captured, there needs to be a way to drill down and find meaning in that data.
I think that IBM is on the right track, however, a lot of the "unstructured" data is industry if not enterprise specific, and trying to create a "standardized toolkit" is about as far as you can go. Really it would be more of a toolkit recognizin g the patterns of the "unstructured" data.
Using the Google-ing of webpages to find information as an example, the tool kit could comprise of some HTML structure knowledge and indexing scheme. It is this form of "intelligence" which is needed.
Of course IBM would need to rethink their extensibility beyond the limited capabilities found in DB2's extenders and apply this DIYBI to IDS first which has a robust enough engine to decrease the time to market and time to value....
But hey! What do I know? I'm just Gumby. ;-)
acronyms galore without expansion
A truly wonderful article which launches straight into deep technical terms without explaining anything. My eyes glazed over faster than Steve Ballmer breaking a chair over Steve Job's head.

IT infrastructure monitoring strategies
Agentless Backup is Not a Myth
Top 10 SIEM implementer’s checklist
Steps to Take Before Choosing a Business Continuity Partner
Enabling efficient data center monitoring