Southampton Uni shows way to a truly open web
Making Berners-Lee's vision a reality
Waiting to exhale
In effect, Berners-Lee advocates want to link up data in a eloquent and constructive way on the web using something called DBPedia as the central repository for information garnered online. And yes, chillingly for some Reg readers, that does involve using Wikipedia as a major data source. It doesn't just take a 'suck it and see' approach, however, instead it grabs "structured information" using sophisticated queries against Jimbo Wales's database and, importantly, links other datasets to Wikipedia from around the web.
In other words it's a bit like telling a web surfer that the populist, if not wholly-reliable online encyclopedia shouldn't be the only source of information. Perhaps proponents of DBPedia would be happy if the database was eventually likened to a detail-obsessed librarian who's middle name is pedant. That certainly appears to be the goal at least.
But unlocking information online remains a huge challenge, despite having a government in the UK that endorses the linked data desires of Berners-Lee, Southampton and others.
"Many research and evaluation projects in the few years of the Semantic Web technologies produced ontologies, and significant data stores, but the data, if available at all, is buried in a zip archive somewhere, rather than being accessible on the web as linked data," explained Berners-Lee back in 2006.
'PDF is an embarrassment to our species'
Currently, if public information is made available online, problems remain with the kind of data formats that are all too readily used by local government departments, academic institutions and other parts of the public sector.
"PDF is an embarrassment to our species," Gutteridge says of Adobe Software's once proprietary but now open standard for document exchange.
"PDF is a brilliant way to simulate A4 or portrait views. It was natural to create a new piece of technology to simulate the old ... But our screens are all A4 landscape yet there is this stupid insistence that the portrait way is still developed. It's a legacy thing and we haven't got around to getting rid of it yet. I've been cringing at it for the past 10 years."
The reality of course is that it's here to stay for now, even if the government is trying to shunt local authorities over to publishing data in CSV and other more open data-friendly formats.
"We can publish papers in a way that anyone can read for free without restriction, it should be open and eventually linked ... It's going to be a long uphill struggle. People are wasting massive amounts of effort by building spreadsheets in each university with the same sort of data and building custom tools," says Gutteridge.
"But you can do so much more in an open model, keeping in mind some things are still commercially sensitive and you still exercise common sense. So I don’t publish my home address or banking details in semantic form, for example ... The only real risk are the people who are used to a closed world and haven't worked out they're saying too much about themselves on Facebook."
Interestingly, all researchers at the UoS are "obliged" to make their data open. "They don't have the right to make it appear only enclosed ... We've shifted the tide, it's not perfect yet," Gutteridge explains.
He admits that the notion of a semantic web is "a challenge because you need to trust your sources".
But Gutteridge prefers to be knee-deep in code.
"Linked data is still semantic web – it's just ditching all the hard stuff. We're not abandoning it, but we're not making it the goal. Ultimately, we provide the tools. Let the politicians do the arguments."
He also concedes: "We will learn down the line that we've cocked up certain ways of doing things with linked data. It's a learning process. Things restructure all the bloody time. A renumbered building, for example, could break the linked data system. It's down to temporal, real-time data. The system's not perfect, but you've got to relax, these are the 404s of the semantic web. For it to work, it has got to work while being a bit broken." ®