Fake news is fake data, 'which makes it our problem', info-slurpers told
Top Gartner tips: Know what data you hold, be trustworthy
Data-hungry organisations have been advised to get a better grip on the data they control and work on building trust.
In a week when analytic technologies have had more press than ever, many of the discussions at the Gartner Data and Analytics Summit, which ran in London from 19-22 March, focused on dealing with the data generated in recent slurp-happy years.
"Never has there been a moment [where] forces outside our world are so relevant to our industry… and vice versa," said Gartner distinguished analyst Ted Friedman.
For instance, he argued that fake news is fake data, "which makes it our problem". As such, gaining and retaining people's trust should be "job number one for everybody in this room".
Among ways to increase trust in data, he said, was making sure that data was better managed and recorded. This will be crucial if organisations are going to be able to offer explanations as to why a predictive model would treat one customer differently to another, he said.
"This demands a mighty impressive data foundation," Friedman went on – but noted that some organisations would have to pay the price of "dumping all their unintegrated data into a data lake".
Meanwhile, new companies are spawning to take advantage of the situation, as the boom in data lakes has left businesses struggling to inventory distributed data assets and classify disorganised data.
"There are many new data cataloguing software vendors... Data catalogues are the new black," he said.
At the same time, Friedman said that the "self service" data initiatives that had been thrown at companies in recent years was starting to reach its limit as data complexity increases.
However, Gartner research veep Kurt Schlegel said in a separate session that companies should "make complexity a competitive advantage".
In order to do so, they need to better consider the context of the data they hold, which he said would be crucial for integration.
He also argued that analytics could be boiled down to classification – identifying the most important attributes, finding associations and clusters, and blending disparate data sets.
After you have classified your data, he said, you can make decisions – and again picked up on the need to build transparency information into the system.
Bots don't spread fake news on Twitter, people do, say MIT eggheadsREAD MORE
The issue of ensuring explainability also topped Teradata CTO Stephen Brobst's list of challenges for data professionals, especially ahead of rules to be introduced in the incoming General Data Protection Regulation.
With increasing use of deep learning and multi layer neural networks, he told The Register, this will become an increasingly difficult task to achieve.
"Right now, the way scoring is done is through very simple transparent mathematical formula - but with a multi layer neural network, it's anything but," he said.
"It's non-linear math, the data goes through a lot of transformations, and actually it's a learning system so the data is changing all the time, so you can't actually explain how the decision is made."
If such a system denies someone a mortgage, for instance, the company can't say "our black box says there's 33 per cent chance you're not going to pay", he said, "this is unacceptable".
Brobst pointed to work Teradata was doing with Danske Bank on fraud detection algorithms that include an integration layer on top of the machine learning models to ensure the blocks can be interpreted.
"We have to have this explainability," he said. "We're in a race with GDPR – it's only two months away, but we think we're going to get there." ®
Sponsored: What next after Netezza?