Feeds

Two centuries of Hansard to move online

Web no longer 'somewhere data goes to die'

Secure remote control for conventional and virtual desktops

Parliament hopes to place all Hansard reports - from 1804 to 2004 - online by the end of this year.

Its information management department is using optical character recognition (OCR) technology to turn three million printed pages of the record of Parliamentary proceedings into digitised text. Some is already online, although the project has not yet been officially approved as a version of Hansard.

Edward Wood, Parliament's director of information management, said the department has sliced up original bound copies of Hansard to obtain the pages for scanning – adding that such books are commonly available, as many libraries are selling them.

"For me, it symbolised opening up the data," he told Kable's Electronic Document and Records Management conference.

Wood said the main aim was to avoid expensive conservation work on printed versions of Hansard used by Parliament's members and staff, but also to allow better searching and reduce storage costs.

The process compares the results of three OCR scans with 100 per cent of the results proof read by a contractor. Parliament also proof reads one per cent to check the quality of the work. Wood said although the likes of Google and Microsoft have digitised some of Hansard as part of other projects, their work "is not particularly good, on the whole – there's very little metadata".

Robert Brook, a developer working on the project, said the system aims to provide excellent metadata, with material linked by bill, MP, constituency and even monarch. "Previously, we've treated the web as somewhere data goes to die," he said, but the aim of this project is to open it to numerous uses.

This has been evident in the eclecticism of searches made by users so far, Brook added. "I expected them to look for Tony Blair and Iraq," he said, but instead popular searches have included Telic, the code name for Britain's operations in Iraq, asbestos use in playgrounds, and Corsham's military communications centre.

Around 95 per cent of searches come through Google. "No one uses our search engine, which is really galling," said Brook. But he added that this means people are finding the nascent system as part of general search, rather than specifically looking for Hansard.

Brook said the system, which relies entirely on open source software and uses open data standards to allow reuse and mash-ups on other websites, will add another decade's worth of material in the next month. If it wins approval, it will eventually "get a portcullis on top", he said, and be adopted as an official archive of Hansard.

This article was originally published at Kablenet.

Kablenet's GC weekly is a free email newsletter covering the latest news and analysis of public sector technology. To register click here.

The essential guide to IT transformation

More from The Register

next story
Hello, police, El Reg here. Are we a bunch of terrorists now?
Do Brits risk arrest for watching beheading video nasty? We asked the fuzz
UK fuzz want PINCODES on ALL mobile phones
Met Police calls for mandatory passwords on all new mobes
Munich considers dumping Linux for ... GULP ... Windows!
Give a penguinista a hug, the Outlook's not good for open source's poster child
EU justice chief blasts Google on 'right to be forgotten'
Don't pretend it's a freedom of speech issue – interim commish
Detroit losing MILLIONS because it buys CHEAP BATTERIES – report
Man at hardware store was right: name brands DO last longer
Snowden on NSA's MonsterMind TERROR: It may trigger cyberwar
Plus: Syria's internet going down? That was a US cock-up
UK government accused of hiding TRUTH about Universal Credit fiasco
'Reset rating keeps secrets on one-dole-to-rule-them-all plan', say MPs
Caught red-handed: UK cops, PCSOs, specials behaving badly… on social media
No Mr Fuzz, don't ask a crime victim to be your pal on Facebook
e-Borders fiasco: Brits stung for £224m after US IT giant sues UK govt
Defeat to Raytheon branded 'catastrophic result'
Yes, but what are your plans if a DRAGON attacks?
Local UK gov outs most ridiculous FoI requests...
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 10 endpoint backup mistakes
Avoid the ten endpoint backup mistakes to ensure that your critical corporate data is protected and end user productivity is improved.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.