Feeds

Judge cracks down on Bayesian stats dodginess in court

Terry Pratchett effect angers beak

Beginner's guide to SSL certificates

Analysis A judge in a (sadly unnamed) British case has decided that Bayes' Theorem - a formula used in court to calculate the odds of whodunnit - shouldn't be used in criminal trials.

Or at least, it shouldn't be relied upon as it has been in recent years: according to the judge, before any expert witness plugs data into the theorem to brief the jury on the likelihood that a defendant is guilty, the underlying statistics should be "firm" rather than rough estimates. The decision could affect things like the odds of matching drug traces, fibres from clothes and footprints to an alleged perp, although not DNA.

In a murder appeal case, brought after a man was convicted on the basis of his footwear almost matching a print linked to the crime, this precise point was made:

The data needed to run these kinds of calculations, though, isn't always available. And this is where the expert in this case came under fire. The judge complained that he couldn't say exactly how many of one particular type of Nike trainer there are in the country. National sales figures for sports shoes are just rough estimates.

Mathematically leaning Reg readers will be able to make much more sense of the details than I as a mere journo will be able to. But from a legal point of view this looks like a good ruling.

Yes, Bayes' Theorem can indeed be used most usefully to make estimates, which give us a good idea of what is likely to have happened. However, that's not quite the same as giving us the information leading to “beyond reasonable doubt” which is what we require before locking someone up.

More than that, the way that the statistics are presented can be more than a tad misleading. For a start, the jury is made up of the general population, not exactly a hotbed of sophisticated statistical reasoning, and being told by experts that there's a one in a million chance leads to an all too common error.

A DNA match to one in a million does not mean that it's a million to one against the bloke 'aving done it, m'lud. Rather, it means that in a population of 65 million that 65 people, based purely on the DNA, could have done it. Our DNA tests thus mean that we now have to go and exclude those other 65, or at least regard them as the prime pool of suspects, not convict our man in the dock purely on the basis that one in a million is beyond that reasonable doubt. Yes, these sorts of mistakes are made in the chain of reasoning.

It can get worse, of course - mentioning no names, no pack drill as it's still a case that gets people het up - the likelihood of any one child dying a cot death is 1 in 79,000 (entirely made up number for illustrative purposes). Two children from the same family dying of cot death is thus 1 in 79,000 x 79,000 which is 1 in 6,241,000,000. One in six billion, so, members of the jury, you know what to do: lock up the mum.

This was actually the logic used by one eminent expert witness: the appeal was eventually allowed, some years later, when it was pointed out that cot death might not actually be an independent event, that perhaps there is a genetic predisposition to it, perhaps the environment means that one cot death increases the chance of a second one. Given one cot death, the chance of a second might only be 1 in 1,000. Or 2,000 (again, made up numbers) which we most certainly wouldn't want to use as the basis of “beyond all reasonable doubt”.

Neither I, the judge, nor anyone else has any serious doubt about the usefulness of Bayesian reasoning in evaluating evidence in court cases. But the techniques have been so badly understood, even by experts, in recent years that a rethink, a stop and a reasoning through all of the implications, doesn't sound like a bad idea. ®

Beginner's guide to SSL certificates

Whitepapers

Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
Go beyond APM with real-time IT operations analytics
How IT operations teams can harness the wealth of wire data already flowing through their environment for real-time operational intelligence.
Why CIOs should rethink endpoint data protection in the age of mobility
Assessing trends in data protection, specifically with respect to mobile devices, BYOD, and remote employees.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.