Feeds

The devil is in the details

Grabbing data quality by the tail

Boost IT visibility and business value

Editors blog It's well-known that data quality is often the devil you find hiding in the implementation detail of many big integration (and other) projects. It's a particular issue when companies merge. Of course, "your" salespeople always fill in contact and customer details accurately and honestly but "their" salespeople apparently got through the volumes by inventing contacts, selling to the entire population of Disneyworld - and if the spelling of a name looked a little odd they simply put up a new customer with a different spelling. Now, your merged database contains twice as many customers as it should, the stockmarket is taking an interest in the merged companies - and, hey, I wonder if there are compliance implications...

Easy enough to fix, of course, with an army of clerical people - if you have the resources (money and time, lots of both) and if your business activities can cope with a poor quality database and still maintain good service levels.

There must be a better way. Well, according to Hugo Verwijs, CEO of Human Inference, there is. Not simply the usual "quick fix" tool but a process change that deals with data quality at the appropriate places in the operational lifecycle - plus, of course, automated tools (its HIquality Suite for Total Data Quality Management) that help to ensure consistency and reduce the manual overhead.

What makes the Human Inference solution different to much ordinary validation is that it doesn't simply think of quality in terms of data formats (rejecting blank fields, numeric surnames and that sort of thing); it deals with the semantics of the data. It treats data quality in a business context. So, it can potentially recognise common misspellings, "translations" of foreign names - even well-known filmstars and cartoon characters used by people who can't be bothered filling in forms.

When a website annoys you by making you type in reams of validated information before dispensing a white paper or something, it is amazing what rubbish a conventional validation will accept. Human Inference tools can do better - leaving the site designer to think about why s/he is annoying potential customers in the first place and about the real business value of the data being collected

Human Inference deals with large companies with specific problems. It isn't really interested in commoditised off-the-shelf solutions for small companies - not yet anyway - when pulling big companies out of potential data quality disasters is such good business. And, of course there are other ways to validate data semantically – you can hard-code domain knowledge for validation (although this isn’t terribly “agile”), validate data against its essential properties (discovered during analysis), use “hard typing” to reject invalid data rigorously at entry. The key point to remember is that allowing poor quality data into your database is dangerous to your business and will be expensive to fix later. And also that “poor quality” has to be defined in a business context and usually isn’t a “yes/no” thing– if you see “Johnny Depp” in a showbiz context it is possibly OK, if he’s a potential customer for your Layer 4-7 Switch technology, then you might wonder – although you still can’t just assume it’s a spoof name.

This sort of fuzzy data quality issue is ubiquitous - and not just in databases. Mistyped email addresses are a dynamic data quality problem. With a "norfolk.org.uk" domain, my email queue collects a worrying amount of information that it shouldn't - private medical records, criminal records, health records etc, all intended for some branch of Norfolk's local government or similar organisations with a duty of confidentiality. Shouldn't commodity email software used in such places automatically learn not to send confidential matter to me after I've returned a couple of mis-addressed emails - or, even, learn to notice proactively that local government communications may be being sent outside local government (by checking the form of the address)? At the least, email software could reduce one addressing quality issue by forcing people select To an CC addresses from an officially sanctioned directory. It can't be that hard to discover that "norfolk.org.uk" isn't a government address, at least to a certain degree of confidence!

Well, of course it isn’t, if you’re a person. Getting a computer to do it without bothering you overmuch doesn’t seem to be as easy as you’d hope.®

Boost IT visibility and business value

More from The Register

next story
KDE releases ice-cream coloured Plasma 5 just in time for summer
Melty but refreshing - popular rival to Mint's Cinnamon's still a work in progress
Leaked Windows Phone 8.1 Update specs tease details of Nokia's next mobes
New screen sizes, dual SIMs, voice over LTE, and more
Secure microkernel that uses maths to be 'bug free' goes open source
Hacker-repelling, drone-protecting code will soon be yours to tweak as you see fit
Mozilla keeps its Beard, hopes anti-gay marriage troubles are now over
Plenty on new CEO's todo list – starting with Firefox's slipping grasp
Apple: We'll unleash OS X Yosemite beta on the MASSES on 24 July
Starting today, regular fanbois will be guinea pigs, it tells Reg
HIDDEN packet sniffer spy tech in MILLIONS of iPhones, iPads – expert
Don't panic though – Apple's backdoor is not wide open to all, guru tells us
Another day, another Firefox: Version 31 is upon us ALREADY
Web devs, Mozilla really wants you to like this one
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Backing up Big Data
Solving backup challenges and “protect everything from everywhere,” as we move into the era of big data management and the adoption of BYOD.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.