Feeds

Search me, guv...

The search engine - the last resort of the well-informed

Intelligent flash storage arrays

Comment I'm always a little nervous about the idea of a search engine as the solution to the tide of "unstructured data" we're all drowning in. For a start, most of it isn't really unstructured - show me an unstructured email invoice and I'll show you something that is useless because you aren't sure who it came from and what it applies to. This means that by treating information which comes with structure and semantic metadata as “unstructured” we are increasing our business risk and making our business processing less efficient.

Secondly, we usually get more hits than we can cope with, so we reduce the number of hits by refining the search in a fairly arbitrary way (e.g. by sticking quotes around a key phrase) or, even worse, by just reading the first page or so, assuming this contains the important stuff. Surely, we all realise that information sources can be designed to optimise retrieval, as compared to other, possibly more authoritative, sources. Again, this increases business risk.

Several approaches provide more useful searches that, for example, put company information resources ahead of the general internet - see, for example, Coppereye Greenwich's indexed approach here. This makes use of the structural information - metadata - that you already have elsewhere about the information you're trying to find in flat files such as transaction logs and audit trails. It can have a Google-like front end but is more specialised than Google.

An alternative, more general, approach that has always interested me is the Google Appliance, as exploited by Information Builders (see here). WebFOCUS Intelligent Search applies tags to company data and passes it, after processing against company security and access policies, to the Google Appliance, thus providing a richer Google search - because you know that you're searching information of relevance and have some idea of what search criteria will make sense.

Information Builders has taken this idea a step further again, by incorporating its Active Reports technology, to deliver user "self service" reports at the portal, in something called WebFocus MagnifyThis means users can take a complete set of Google hits and apply spreadsheet-style reporting in near real-time - the hits can be categorised and sorted by category. So, you can carry out data mining and analysis against the data found by your search engine.

Navigation tree

WebFOCUS Magnify exploits iWay Software's integration technology, to enrich the content of an Information Bus, thus reducing the overheads involved in trawling, especially, databases for information. WebFOCUS Magnify can use metadata tags in this enriched content to produce a "navigation tree" that will help users find the content they need - even if it doesn't turn up on the first few pages of the search engine report.

So far so good. This is an enterprise tool that's going to interest Information Builders' loyal customer-base. It's powerful, but it's not exactly cheap - Information Builders tends to deal with enterprise and government customers, with huge information stores and mission-critical applications.

However, Dave Armstrong from Information Builders presages much wider applications for Magnify in the future. Fundamentally, it is intended to be search-engine neutral - it currently supports both Google and Lucerne. But, Armstrong says, it would have trouble with engines like Autonomy, which make use of metadata categorisation in their own way. So, you could imagine an ISP, say, using Magnify to provide a low- or zero-cost value-add service to its general customer base.

Nevertheless, if you have serious business-critical questions to answer, this sort of advanced search is only technology-enabling. Provenance, as my “information professional” wife points out, is all - where does the information come from and can it be trusted.

A basic Google search isn't much help here, but at least WebFOCUS Intelligent Search, say, points you at categorised company information. Next, a lack of systematic bias helps - obviously online searches are limited to online information (see your librarian - "company information professional"; in many fields, the authoritative information isn't online or isn’t available to search engines) but various "Google hacks", e.g., bias the information in the first pages of a Google search. Magnify, say, could help here.

Then, you want to make use of everything you know about the semantics and structure of the information you’re searching, to make the search rather more efficient than, say, a Google keyword search returning hundreds of thousands of hits. Finally, as a last resort, you can spend the afternoon playing with Google or whatever, hoping that what you find (which is usually useful enough) doesn't miss out on something that you'd have really, really wanted to use - if you'd known it was there.

Information Builders' (and other vendors’) technology could be part of the solution to this "information" - as opposed to "data" - retrieval problem. But so too, of course, is employing an information professional, who actually understands the difference between Data and Information and knows about information quality and provenance issues, to “mentor” searchers.

Always remember, in this world of automated systems, that’s there is a lot more to the professional “librarian” than the stereotype suggests. For a start, compare the job spec here with that of the average IT professional, who is usually self-certified and needs no formally recognised education, although the BCS is manfully trying to address this challenge. ®

Providing a secure and efficient Helpdesk

More from The Register

next story
Preview redux: Microsoft ships new Windows 10 build with 7,000 changes
Latest bleeding-edge bits borrow Action Center from Windows Phone
Google opens Inbox – email for people too thick to handle email
Print this article out and give it to someone tech-y if you get stuck
Microsoft promises Windows 10 will mean two-factor auth for all
Sneak peek at security features Redmond's baking into new OS
FTDI yanks chip-bricking driver from Windows Update, vows to fight on
Next driver to battle fake chips with 'non-invasive' methods
UNIX greybeards threaten Debian fork over systemd plan
'Veteran Unix Admins' fear desktop emphasis is betraying open source
Entity Framework goes 'code first' as Microsoft pulls visual design tool
Visual Studio database diagramming's out the window
Google+ goes TITSUP. But WHO knew? How long? Anyone ... Hello ...
Wobbly Gmail, Contacts, Calendar on the other hand ...
prev story

Whitepapers

Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
New hybrid storage solutions
Tackling data challenges through emerging hybrid storage solutions that enable optimum database performance whilst managing costs and increasingly large data stores.