Data.gov.uk chief admits transparency concerns
Raw info may be too confusing
The head of the government's website for the release of public sector data has said it is a challenge to ensure that users can understand the statistics.
Cabinet Office official Richard Stirling, who leads the team that runs Data.gov.uk, said that if he was at the Office for National Statistics he would have concerns about statistical releases and people making assumptions "that aren't quite valid".
Speaking in a podcast published on Data.gov.uk on 29 July 2010, he expressed a worry about whether "people understood the accuracy of the methodology and were able to draw sensible conclusions".
"We've tried to take some sensible steps to address that. One is through putting a short two or three paragraph description of what the dataset is on Data.gov.uk alongside the files and also a link to more detailed guidance," Stirling said.
On the subject of whether the government should move towards providing aggregated data rather than raw information that the public may not understand, he said that the coalition plans to "put more work into doing that" in some situations.
"I think what is likely to happen is datasets where there is an awful lot of interest and we can see that there is interest and there is a lot of interest in the visualisation of it, then I think we'll put more work into doing that web front end," the civil servant added.
But Stirling said he was mindful about not duplicating aggregated data that had already been carried out by the public.
"You've got to ask yourself, 'Is there any value in us replicating that and should we showcase what someone else has done?'" he added.
This article was originally published at Kable.
Kable's GC weekly is a free email newsletter covering the latest news and analysis of public sector technology. To register click here.
Keep on keeping on
The alternative of publishing summaries is frought with dangers. What was the methodology behind the analysis? Were the most appropriate statistical methods used? Who says so?
I say keep that raw information coming. Once its out, its out forever and and be checked retrospectively.
Despite the reservations, its cheaper too. A process to publish raw data can be created and run month after month, year after year. An analysis requires consideration, discussion and review, committees to agree the findings, etc. All very costly.
Bit like OSS
I can see the guys point - why spend public money putting raw data out there that very few people can really understand and be bothered to take the time to understand? That money has to come from somewhere.
On the other hand openess is good. Seems like they are looking to be open but aren't sure of the best way to go about it. Which is a good thing!
Cost permitting releasing both together would seem the best solution - their work can easily be checked by anyone with the relevant skill set and time on their hands, but those people for whom stats is an arcane and interminably boring art can just download the summary, knowing that it has been scrutinised.
Kinda like OSS really - some guys check the src, but most ppl just download the bin from the official site and rely on the few to check it's all fine and dandy.
A couple of important principles?
1 - full dataset of Raw and/or ordered data are available to the public
(advantages: independent data analysis or commissioned commercial analysis on publicly funded data; universities or other training establishments to run courses incorporating analysis of real data where appropriate)
2 - summarised form of data presented
(advantage: data analysis by ONS available for scrutiny when/where required)
There are two main strands - possibly three:
the source data
analysis compiled on source data
decisions influenced or based on analysis of source data