Feeds

Search me, guv...

The search engine - the last resort of the well-informed

The Power of One Brief: Top reasons to choose HP BladeSystem

Comment I'm always a little nervous about the idea of a search engine as the solution to the tide of "unstructured data" we're all drowning in. For a start, most of it isn't really unstructured - show me an unstructured email invoice and I'll show you something that is useless because you aren't sure who it came from and what it applies to. This means that by treating information which comes with structure and semantic metadata as “unstructured” we are increasing our business risk and making our business processing less efficient.

Secondly, we usually get more hits than we can cope with, so we reduce the number of hits by refining the search in a fairly arbitrary way (e.g. by sticking quotes around a key phrase) or, even worse, by just reading the first page or so, assuming this contains the important stuff. Surely, we all realise that information sources can be designed to optimise retrieval, as compared to other, possibly more authoritative, sources. Again, this increases business risk.

Several approaches provide more useful searches that, for example, put company information resources ahead of the general internet - see, for example, Coppereye Greenwich's indexed approach here. This makes use of the structural information - metadata - that you already have elsewhere about the information you're trying to find in flat files such as transaction logs and audit trails. It can have a Google-like front end but is more specialised than Google.

An alternative, more general, approach that has always interested me is the Google Appliance, as exploited by Information Builders (see here). WebFOCUS Intelligent Search applies tags to company data and passes it, after processing against company security and access policies, to the Google Appliance, thus providing a richer Google search - because you know that you're searching information of relevance and have some idea of what search criteria will make sense.

Information Builders has taken this idea a step further again, by incorporating its Active Reports technology, to deliver user "self service" reports at the portal, in something called WebFocus MagnifyThis means users can take a complete set of Google hits and apply spreadsheet-style reporting in near real-time - the hits can be categorised and sorted by category. So, you can carry out data mining and analysis against the data found by your search engine.

Navigation tree

WebFOCUS Magnify exploits iWay Software's integration technology, to enrich the content of an Information Bus, thus reducing the overheads involved in trawling, especially, databases for information. WebFOCUS Magnify can use metadata tags in this enriched content to produce a "navigation tree" that will help users find the content they need - even if it doesn't turn up on the first few pages of the search engine report.

So far so good. This is an enterprise tool that's going to interest Information Builders' loyal customer-base. It's powerful, but it's not exactly cheap - Information Builders tends to deal with enterprise and government customers, with huge information stores and mission-critical applications.

However, Dave Armstrong from Information Builders presages much wider applications for Magnify in the future. Fundamentally, it is intended to be search-engine neutral - it currently supports both Google and Lucerne. But, Armstrong says, it would have trouble with engines like Autonomy, which make use of metadata categorisation in their own way. So, you could imagine an ISP, say, using Magnify to provide a low- or zero-cost value-add service to its general customer base.

Nevertheless, if you have serious business-critical questions to answer, this sort of advanced search is only technology-enabling. Provenance, as my “information professional” wife points out, is all - where does the information come from and can it be trusted.

A basic Google search isn't much help here, but at least WebFOCUS Intelligent Search, say, points you at categorised company information. Next, a lack of systematic bias helps - obviously online searches are limited to online information (see your librarian - "company information professional"; in many fields, the authoritative information isn't online or isn’t available to search engines) but various "Google hacks", e.g., bias the information in the first pages of a Google search. Magnify, say, could help here.

Then, you want to make use of everything you know about the semantics and structure of the information you’re searching, to make the search rather more efficient than, say, a Google keyword search returning hundreds of thousands of hits. Finally, as a last resort, you can spend the afternoon playing with Google or whatever, hoping that what you find (which is usually useful enough) doesn't miss out on something that you'd have really, really wanted to use - if you'd known it was there.

Information Builders' (and other vendors’) technology could be part of the solution to this "information" - as opposed to "data" - retrieval problem. But so too, of course, is employing an information professional, who actually understands the difference between Data and Information and knows about information quality and provenance issues, to “mentor” searchers.

Always remember, in this world of automated systems, that’s there is a lot more to the professional “librarian” than the stereotype suggests. For a start, compare the job spec here with that of the average IT professional, who is usually self-certified and needs no formally recognised education, although the BCS is manfully trying to address this challenge. ®

Securing Web Applications Made Simple and Scalable

More from The Register

next story
Apple fanbois SCREAM as update BRICKS their Macbook Airs
Ragegasm spills over as firmware upgrade kills machines
HIDDEN packet sniffer spy tech in MILLIONS of iPhones, iPads – expert
Don't panic though – Apple's backdoor is not wide open to all, guru tells us
Mozilla fixes CRITICAL security holes in Firefox, urges v31 upgrade
Misc memory hazards 'could be exploited' - and guess what, one's a Javascript vuln
NO MORE ALL CAPS and other pleasures of Visual Studio 14
Unpicking a packed preview that breaks down ASP.NET
Captain Kirk sets phaser to SLAUGHTER after trying new Facebook app
William Shatner less-than-impressed by Zuck's celebrity-only app
Cheer up, Nokia fans. It can start making mobes again in 18 months
The real winner of the Nokia sale is *drumroll* ... Nokia
EU dons gloves, pokes Google's deals with Android mobe makers
El Reg cops a squint at investigatory letters
Chrome browser has been DRAINING PC batteries for YEARS
Google is only now fixing ancient, energy-sapping bug
prev story

Whitepapers

Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Reducing security risks from open source software
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Consolidation: the foundation for IT and business transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.