Automated, insight cannot be: Jedi master of statistics was good – but beware the daft side
Hans Rosling is gone, but dashboards and graphs are dangerous to knowledge
So, farewell then Hans Rosling, educator and "Jedi master of data visualisation". For in a world increasingly addicted to alternative facts, you pioneered software – Gapminder – and a viewer-friendly approach with bubble charts that allowed you to communicate simple important messages about the world through the medium of the Ted Talk.
It's all about the visualisation which enabled you, as in this short clip on world wealth and life expectancy, to take some 120,000 bits of data and condense them into one intelligent whole illustrating a basic truth about income and life expectancy.
What's not to like? In the hands of a skilled presenter, visualisation adds greatly to the value of data. Yet in the wider world, the process of reducing complexity to simple visuals can be confusing or even actively misleading.
Sometimes, the misdirection is unintended. Take pie charts, which since their alleged first appearance in William Playfair's Statistical Breviary (1801) have consistently inspired professional disdain. The problem, as any statistician will tell you, is that if the first ambition of any visual is to explain and clarify, the humble pie chart only achieves this where the number of data points involved are few (two or at most three) and where nothing especially complicated – like plotting data trends over time – is called for.
This is because people find it much harder to interpret angles than lengths, and are therefore confused by similar categories or more than three categories in a single pie.
Add in colour blindness, which afflicts up to 10 per cent of men, not to mention the problem of the 3D pie, in which the true extent of variables is distorted by the angle of view, and the communication value of pies rapidly diminishes.
Pretty bubbles in the air
As for the bubble chart, so cleverly deployed by Rosling, these are far from obvious. They are a wonderful way to illustrate several dimensions simultaneously. In the presentation linked above, the bubbles illustrate three dimensions: their position against the x and y axes maps income and age, while their volume illustrates population size (though again, there is an issue of how well the human brain processes volume versus line length).
But while the presentation is well done, it breaks another cardinal rule of statistics: "Graphs are like jokes – if you have to explain them they have failed." It is a view that I can empathise with, having used bubble charts, just once in the last couple of decades, to communicate a moderately complex message to a client. Fine – as long as I was in the room. But little more than five minutes after I left, the client was on the phone, asking me to explain "just one more time".
There are a host of issues implicit in the use of graphs. Use too many – deliver presentations that are just a succession of graphs, one after another – and your clients' eyes will glaze over. Label them clearly: the title should be simple, succinct and make the key point illustrated by the graph; and don't skew interpretation of the numbers with biased labels.
Beware of pictograms, which, like pie charts, also provide conflicting information according to what aspect of the "pict" you choose to scale up. If the quantity represented doubles, do you simply lengthen your image, doubling both height and width – in which case, you are actually showing an increase by a factor of four? Or do you multiply both dimensions by 1.413 (a rough approximation to the square root of two)?
Likewise, resist adding 3D effects if it is mere decoration.
As for axes, these are a perennial source of problems. Show point of origin on both x and y scales, and you risk turning all but the most egregious variations (in say the FTSE 100, for instance) as a near straight line. But truncate them – plot the FTSE over a range from 6800 to 7200, for instance – and you appear to show major fluctuations where only minor ones exist.
Don't manipulate the effect illustrated by stretching or compressing axes: use logarithmic scales sparingly. They may be justified for some types of data, but the average manager is likely to have difficulty interpreting them.
Don't cherry pick
Above all, do not cherry-pick data points – not, that is, unless your overall aim is propaganda. This, according to a number of leading climate scientists, is an approach taken on more than one occasion by the Mail on Sunday and Daily Mail in recent years, most famously in 2012 and 2016 when, by highlighting selected data over a short time period, they were able to produce sensational headlines debunking pretty much all of climate science over the past two decades.
If there is a real danger of humans getting it wrong when creating data presentations, perhaps it would be better if we removed the human element as far as possible and relied on the growing range of automated analytical tools available to purchase – or free.
Whatever you need to evaluate – social media or online purchasing effectiveness – there is now a friendly app out there to help you. Meanwhile, how could you possibly manage your overall business without a dashboard complete with KPIs, trend analysis and that most desirable of management assets, "drilldown".
Well, up to a point. As someone who once ran a small analytical team supporting the marketing efforts of one of the UK's largest financial institutions, a seriously depressing aspect of the job was the ratio of time spent producing the weekly "marketing pack" – a massive exercise in paper printing that seemed always to grow larger, never to be cut back – versus time spent actively analysing data.
Visualisation is only ever as good as the underlying data
Managers wanted visuals and analysts, with good degrees in maths or statistics, were therefore downgraded to visual creators. No matter how diligently we tried to automate the process, every week saw the presentational demand increase, and the stack of paper circulated to senior managers pile up.
We were using an increasingly rickety structure built on SPSS running over data extracted from the mainframe, when what was needed was a drilldown tool sitting over the raw data.
Dashboards, from Hootsuite to Google Analytics, from SPSS Modeler (formerly Clementine) to SAS BI are a massive improvement on where we were 10, 20 years ago.
So why the hesitation? Because analysis begins not with visualisations, with the presentation stage, which if it is to mean anything should be about presenting to your audience in clear and unequivocal fashion the results of analysis, but with the data.
Much can go wrong with the presentation stage, although taking care to avoid committing the various visualisation crimes listed here would be a good start. But... and it is the most enormous of buts: presentation, visualisation, whatever you call it, is only ever as good as the underlying data. And before a single graph is written or table loaded, some analyst somewhere has given the data the once over and reached conclusions about it.
Of course, these may be as anodyne as deciding to leave the data exactly as is, but that is as much a view on data quality as identifying and removing invalid or impossible values. Individuals whose date of birth suggests they are now entering their third century. Or, more complex, have just paid £10m for a one-bedroom cottage in the Outer Hebrides.
Then there are the outliers, values that very obviously are going to skew the overall picture; and the decision whether to keep or lose them will always be subjective. Yet in working closely with the raw data, an analyst will know what decisions have been taken and be able to compensate.
This is the opposite of what happens when a manager calls up a chart with no familiarity with the underlying data.
In the end, Hans Rosling was undoubtedly a good thing: a man who painted pictures with data and communicated complex ideas to people often lacking the deeper skills to interpret numbers for themselves.
Do not, however, mistake good communication skills for real analysis: and when it comes to understanding your own business, do not for a moment imagine that insight can be wholly automated. ®
Sponsored: What next after Netezza?