The Register® — Biting the hand that feeds IT

Comments on: IBM gets handle on unstructured data

acronyms galore without expansion 

Posted Tuesday 8th August 2006 09:22 GMT

A truly wonderful article which launches straight into deep technical terms without explaining anything. My eyes glazed over faster than Steve Ballmer breaking a chair over Steve Job's head.

Not really. But it is a fluff piece. 

Posted Tuesday 8th August 2006 13:42 GMT

I think the main acronym is DIYBI or "Do it Yourself" BI (where BI Is the industry term "Business Intelligence".)

Nelson is actually one of the better guys in the lab, but I do agree that this article was a boring fluff piece.

I think the point is that IBM is extending their BI vision and that since more "unstructured" enterprise data is being captured, there needs to be a way to drill down and find meaning in that data.

I think that IBM is on the right track, however, a lot of the "unstructured" data is industry if not enterprise specific, and trying to create a "standardized toolkit" is about as far as you can go. Really it would be more of a toolkit recognizin g the patterns of the "unstructured" data.

Using the Google-ing of webpages to find information as an example, the tool kit could comprise of some HTML structure knowledge and indexing scheme. It is this form of "intelligence" which is needed.

Of course IBM would need to rethink their extensibility beyond the limited capabilities found in DB2's extenders and apply this DIYBI to IDS first which has a robust enough engine to decrease the time to market and time to value....

But hey! What do I know? I'm just Gumby. ;-)

Nice thoughts, but already implemented in InfoCodex. 

Posted Tuesday 8th August 2006 14:22 GMT

Thanks for the interesting article. Once again IBM is giving us a great vision about the future and how unstructured information can be searched.

InfoCodex already does all this today with the help of a linguistical database and synonym and/or similarity search across 5 languages (German, French, Italian, English and Spanish). With InfoCodex you can search for a block of text in one language and it will find you all the similar documents in the other languages as well. All of this is done without one single minute of training - because of the linguistical database that contains 2.9 Mio words and terms (i.e. "European Court of Justice" or "The President of the United States" are terms and reconized as such).

See the following links:

http://www.ywesee.com/pmwiki.php/Ywesee/InfoCodexProcedure

http://www.ywesee.com/uploads/Ywesee/archimag-e.pdf

http://www.ywesee.com/uploads/Ywesee/Evaluationsentscheid-e.pdf

http://www.ywesee.com/uploads/Main/USP_e.pdf

Don’t Miss

SunSun's surviving staff hit with 'motivation' missive

Exclusive Code: Your solace, our savior

Ubuntu teaser Ubuntu's Karmic Koala bares fangs at Windows 7

Review Shuttleworthian scrap

AppleChange your views: OS X tags exploited

Mac Secrets Apple windows insider

JavaSun preps cell-phone Java plan for netbooks

OpenWorld 09 Modules not globules