Feeds

Search me, guv...

The search engine - the last resort of the well-informed

Securing Web Applications Made Simple and Scalable

Comment I'm always a little nervous about the idea of a search engine as the solution to the tide of "unstructured data" we're all drowning in. For a start, most of it isn't really unstructured - show me an unstructured email invoice and I'll show you something that is useless because you aren't sure who it came from and what it applies to. This means that by treating information which comes with structure and semantic metadata as “unstructured” we are increasing our business risk and making our business processing less efficient.

Secondly, we usually get more hits than we can cope with, so we reduce the number of hits by refining the search in a fairly arbitrary way (e.g. by sticking quotes around a key phrase) or, even worse, by just reading the first page or so, assuming this contains the important stuff. Surely, we all realise that information sources can be designed to optimise retrieval, as compared to other, possibly more authoritative, sources. Again, this increases business risk.

Several approaches provide more useful searches that, for example, put company information resources ahead of the general internet - see, for example, Coppereye Greenwich's indexed approach here. This makes use of the structural information - metadata - that you already have elsewhere about the information you're trying to find in flat files such as transaction logs and audit trails. It can have a Google-like front end but is more specialised than Google.

An alternative, more general, approach that has always interested me is the Google Appliance, as exploited by Information Builders (see here). WebFOCUS Intelligent Search applies tags to company data and passes it, after processing against company security and access policies, to the Google Appliance, thus providing a richer Google search - because you know that you're searching information of relevance and have some idea of what search criteria will make sense.

Information Builders has taken this idea a step further again, by incorporating its Active Reports technology, to deliver user "self service" reports at the portal, in something called WebFocus MagnifyThis means users can take a complete set of Google hits and apply spreadsheet-style reporting in near real-time - the hits can be categorised and sorted by category. So, you can carry out data mining and analysis against the data found by your search engine.

Navigation tree

WebFOCUS Magnify exploits iWay Software's integration technology, to enrich the content of an Information Bus, thus reducing the overheads involved in trawling, especially, databases for information. WebFOCUS Magnify can use metadata tags in this enriched content to produce a "navigation tree" that will help users find the content they need - even if it doesn't turn up on the first few pages of the search engine report.

So far so good. This is an enterprise tool that's going to interest Information Builders' loyal customer-base. It's powerful, but it's not exactly cheap - Information Builders tends to deal with enterprise and government customers, with huge information stores and mission-critical applications.

However, Dave Armstrong from Information Builders presages much wider applications for Magnify in the future. Fundamentally, it is intended to be search-engine neutral - it currently supports both Google and Lucerne. But, Armstrong says, it would have trouble with engines like Autonomy, which make use of metadata categorisation in their own way. So, you could imagine an ISP, say, using Magnify to provide a low- or zero-cost value-add service to its general customer base.

Nevertheless, if you have serious business-critical questions to answer, this sort of advanced search is only technology-enabling. Provenance, as my “information professional” wife points out, is all - where does the information come from and can it be trusted.

A basic Google search isn't much help here, but at least WebFOCUS Intelligent Search, say, points you at categorised company information. Next, a lack of systematic bias helps - obviously online searches are limited to online information (see your librarian - "company information professional"; in many fields, the authoritative information isn't online or isn’t available to search engines) but various "Google hacks", e.g., bias the information in the first pages of a Google search. Magnify, say, could help here.

Then, you want to make use of everything you know about the semantics and structure of the information you’re searching, to make the search rather more efficient than, say, a Google keyword search returning hundreds of thousands of hits. Finally, as a last resort, you can spend the afternoon playing with Google or whatever, hoping that what you find (which is usually useful enough) doesn't miss out on something that you'd have really, really wanted to use - if you'd known it was there.

Information Builders' (and other vendors’) technology could be part of the solution to this "information" - as opposed to "data" - retrieval problem. But so too, of course, is employing an information professional, who actually understands the difference between Data and Information and knows about information quality and provenance issues, to “mentor” searchers.

Always remember, in this world of automated systems, that’s there is a lot more to the professional “librarian” than the stereotype suggests. For a start, compare the job spec here with that of the average IT professional, who is usually self-certified and needs no formally recognised education, although the BCS is manfully trying to address this challenge. ®

Bridging the IT gap between rising business demands and ageing tools

More from The Register

next story
KDE releases ice-cream coloured Plasma 5 just in time for summer
Melty but refreshing - popular rival to Mint's Cinnamon's still a work in progress
NO MORE ALL CAPS and other pleasures of Visual Studio 14
Unpicking a packed preview that breaks down ASP.NET
Secure microkernel that uses maths to be 'bug free' goes open source
Hacker-repelling, drone-protecting code will soon be yours to tweak as you see fit
Cheer up, Nokia fans. It can start making mobes again in 18 months
The real winner of the Nokia sale is *drumroll* ... Nokia
Put down that Oracle database patch: It could cost $23,000 per CPU
On-by-default INMEMORY tech a boon for developers ... as long as they can afford it
Another day, another Firefox: Version 31 is upon us ALREADY
Web devs, Mozilla really wants you to like this one
Google shows off new Chrome OS look
Athena springs full-grown from Chromium project's head
prev story

Whitepapers

Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Seven Steps to Software Security
Seven practical steps you can begin to take today to secure your applications and prevent the damages a successful cyber-attack can cause.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.