Dr Hannah Fry: We need to be wary of algorithms behind closed doors
UCL researcher on the tragedy of the age of data
Interview Sure, algorithms are insanely useful, but we need to watch we don't become complacent and unable to question them, University College London's Dr Hannah Fry warned in an interview with The Register.
Dr Fry is a lecturer in the mathematics of cities at the Centre for Advanced Spatial Analysis at UCL, where her research "revolves around the study of complex social and economic systems at various scales, from the individual to the urban, regional and the global, and particularly those with a spatial element."
While not engaged in research, however, Dr Fry is quickly becoming one of the UK's favourite mathematicians, known for her work on BBC 4's The Joy of Data, as well as her popular TED talk, 'The Mathematics of Love', which applied statistical and data-scientific models to dating, sex and marriage.
Chatting to The Register ahead of DataFest2017, the inaugural week-long data science festival in Edinburgh, Dr Fry said she thought the event was going to be "a lot of fun".
"It's perfectly positioned time-wise. It's something people really need to address, and having so many excellent people together in a room at once; it's going to be a great few days."
"Data science as a field has exploded over the past five years," because there's "much more access to data now" said Dr Fry, noting that with "sensors, IoT, with us living more of our lives online" there's now "very little that is untouched by data".
We "realised a few years ago how much data there was," Dr Fry said. "I think the whole thing is very exciting. We have these wonderful opportunities to stand back and rethink how we design our societies, our businesses, almost everything we encounter on a daily basis."
That said, it's still necessary for people to be "paying attention to how biases you have in data can end up feeding through to the analyses you're doing".
Algorithms behind closed doors
Last week, a paper by Julia Powles, an academic at the University of Cambridge – though soon departing for Cornell University in New York – and Hal Hodson, a journalist, described a deal between Google DeepMind and the Royal Free London NHS trust to use patient data without explicit consent as "inexcusable" and potentially in breach of data protection laws.
Dr Fry hadn't read the paper, but believed it was "a conversation that needs to be addressed" especially when it came to ownership of data, access to data, and most importantly, "transparency in terms of the algorithms".
Proprietary software is built with an incentive that might not align with the interests of individual people, who are just data points within it, said Dr Fry. This can be a casual issue or a serious problem, she added, because these algorithms can be used in various situations, from encouraging consumers to purchase particular products, through to establishing whether individuals get loans or decent insurance rates, and have even been used in the US criminal justice system too.
"Algorithms that sit behind closed doors, we need to open those up a bit," said Dr Fry. The issue is that without access to seeing how they function, "you can't argue against them. If their assumptions and biases aren't made open to scrutiny then you're putting a system in the hands of a few programmers who have no accountability for the decisions that they're making."
"In some situations, this doesn't matter," Dr Fry acknowledged. "Netflix is not fundamentally important to the structure of society; but then, some algorithms about predicting reoffending rates for individuals in US are used in sentencing, and the analysis of the data has very serious consequences there.
"An example I use in my talk is of a young man who was convicted of the statutory rape of a young girl – it was a consensual act, but still a statutory crime – and his data was put into this recidivism algorithm and that was used in his sentencing. Because he was so young and it was a sex crime, it judged him to have a higher rate of offending and so he got a custodial sentence. But if he had been 36 instead of 19, he would have received a more lenient sentence, though by any reasonable metric," one might expect a 36-year-old to receive a more punitive sentence.
Collaboration and interest
Dr Fry said the stuff she tends to do "thinks about things from the perspective of the individual in society, rather than as a customer. When designing algorithms as a business owner, your incentive is your profit, something for your business, it's not an incentive to maximise something for the individual. If the two things align then that's great, but generally you're taking care of your business."
The issue is where these two things diverge, when algorithms protect the business rather than individuals, she added. "Classic examples are insurance rates, or banks giving loans, where people from particular backgrounds are very unfairly disadvantaged because of the data category that they're in. You could argue that unfairness extends out to other types of commercial software – there was LinkedIn showing higher paid job advertisements more often to men than women," which was based on dodgy analysis too.
Inevitably there are biases in data because you can't capture the completeness of the real world. Not matter how rich your data sources are, you can't capture the vast richness of reality, and as a result anything you leave out will bias how the world looks through your data. And that's fine, but we have to be aware that that's happening.
"And anytime a programmer makes a decision about how to deal with data, how to average it or clean it, you're imparting more of your own bias on it. Even professionals making their data as impartial as possible, they are expecting the representation of reality that it gives them.
"Sometimes these assumptions and biases can be really hidden, and that can be dangerous," she added, "but at the same time, though, it's not as if live biases don't exist in systems without algorithms and data," noting studies showing that judges have passed harsher sentences just before lunch, or when local football teams have recently lost a game.
It could be sweet
There's a possibility – as with the work of startup Numerai, as covered by Wired – to use algorithms within a social system that is "much more open source and collaborative," said Dr Fry. "That's one way to guard against these biases and unintended consequences that can end up having a damaging effect.
"I work in an interdisciplinary department. When you're looking into the data of social systems or how society's structures, the silos that were created a couple of hundred years ago don't apply. It has to be a collaborative effort.
"Imagine life without any algorithms at all, you wouldn't be able to do anything. This is already completely encompassing. We have a habit of over-trusting what mathematics or computer scientists tell us to do, without questioning it, too much faith in the magical power of analysis.
"I would like people to know more that there are limitations. Algorithms and data should support the human decision, not replace it." ®