Data.gov.uk troupe gets shirty about standards
For the people, just not all of the time :(
The data.gov.uk team has defended the release of datasets that are proving to be standards-unfriendly to developers.
William Perrin and Chris Taggart penned a joint missive on the data.gov.uk site in which they rejected the idea of immediately creating an agreed set of data standards for publishing information online for coders.
"Local authorities should not wait for the process of agreeing standards or ontologies - they should publish now in line with the [Sir Tim] Berners Lee principles noting the guidance we set out. By all means engage in standards setting processes, in the long term if you have the spare resources but the data should be published first," said Perrin and Taggart.
"Standards exercises can be valuable. But they can take months or years, consume scarce resources and blunt early enthusiasm."
The pair then went on to blurt out Sir TBL's two philosophies on slapping data onto the interwebs.
"The top-down one is to make a corporate or national plan, by getting committees together of all the interested parties, and make a consistent set of terms (ontology) into which everything fits. This in fact takes so long it is often never finished, and anyway does not in fact get corporate or national consensus in the end," according to the Greatest Living Briton.
"The other method experience recommends is to do it bottom up. A top-level mandate is extremely valuable, but grass-roots action is essential. Put the data up where it is: join it together later."
In other words, the men advised that developers get on with using the data to help improve "transparency" rather than sit and wait for local authorities and other government players to write up lengthy guidelines on how data should be presented online.
"Rooms full of officials setting abstract standards are unlikely to achieve timely, useful results that provide the transparency Ministers seek by the end of the year," said Perrin and Taggart.
"Transparency is about the practical use of the data by people who are not inside government. This community must be engaged at all stages - both as core beneficiaries and to drive the process to meet their need. It is the people's data, after all."
Cue the arrival of papers garnered from the first "Transparency Board" meeting on the 24 June 2010 that were presented to the public in PDF form this morning.
Meanwhile, Taggart - who is the man behind OpenlyLocal.com, the local government equivalent of TheyWorkForYou.com - had a gripe of his own last week about the ConDem coalition's plans to publish all council spending over £500 by January 2011.
"Now, however, with barely the ink dry, the reality is looking not just a bit messy, a bit of a first attempt (which would be fine and understandable given the timescale), but Not Open At All," he grumbled on his 'countculture' blog.
Taggart is also a member of the Local Public Data Panel, which oversees the drafting of guidelines for publishing the local spending data.
He admitted that the group naively presumed that the data would be released "open and free for reuse by all". Instead he was surprised to discover that private company Spikes Cavell was granted what Taggart described as "privileged access" to the data, after it was farmed out by councils.
Access to the information via the SpotlightOnSpend website, he noted, was horribly locked down in the firm's "proprietry [sic] and definitely non-open database".
Not long after Taggart's blog post, which was quickly splashed all over Twitter, the Transparency Board responded with a statement about the whole sorry affair.
It said it was working on draft proposals for releasing public data under a free reuse, including commercial reuse, licence.
"Data released under the Freedom of Information Act or the new Right to Data should be automatically released under that licence," it suggested.
"We have already reminded those involved of this principle and the existing availability of the ‘data.gov.uk’ licence which meets its criteria, and we understand that urgent measures are already taking place to rectify the problems identified by Chris [Taggart]."
And then fast forward to Taggart's joint post with Perrin on "avoiding a standards roadblock".
The two issues are of course unrelated. One considers that certain government data - no matter in what form - should be available to the public as soon as possible.
Meanwhile, the other shows just how quickly the corporate world is capable of snaffling 'free' data right from under the noses of happy-clappy coders. ®
Its all very well
To say how wonderful providing all this data would be, but who's going to pay for people to write code to extract the data from all the various different systems, format it in whatever format and publish it, check it for accuracy and then maintain and change it month after month when the source systems change? How much extra do you want put on your Council tax to pay for it?
One of the problems with the late administration was their habit of announcing wonderful initiatives without a frigging clue of how it was going to be paid for. It would have been nice for central government to have gained a sense of reality, but obviously not...
Oh, and please don't bother to comment on the lines of "I could do it all in a week with a perl script/.net script".
"As for Extraction, writing something to do extraction should be a one-off thing, which shouldnt take a competent programmer long. In my experience, its never writing the data extraction that takes a long time,"
Don't be too sure. Contractor friend of mine told of a project where Team leader sank *weeks* of time working up this super-duper ETL tool to do a *single* one time extraction to populate a new database. Know fixed O/P database known fixed database to be input to.
I've used import tools that should have a simple decision table internal design but had a rats nest of and/or logic instead to parse the file and route the records (yes I know awk or perl could probably have done the job but where do find perl on an i-series?)
The title is required, and must contain letters and/or digits
Were talking data standards, not extraction. it shouldnt take too many people to choose a comon set of basic standards.
As for Extraction, writing something to do extraction should be a one-off thing, which shouldnt take a competent programmer long. In my experience, its never writing the data extraction that takes a long time, its cleaning up the rubbish that inevitably crufts a database.
Indeed, i tend to find it goes a lot quicker without the inevitable layer of middle management trying to justify their existances. Of course, As Local Government IT is the definintion of needless Bureaucracy, i suspect that wont be avoidable.