Feeds

Two centuries of Hansard to move online

Web no longer 'somewhere data goes to die'

Top three mobile application threats

Parliament hopes to place all Hansard reports - from 1804 to 2004 - online by the end of this year.

Its information management department is using optical character recognition (OCR) technology to turn three million printed pages of the record of Parliamentary proceedings into digitised text. Some is already online, although the project has not yet been officially approved as a version of Hansard.

Edward Wood, Parliament's director of information management, said the department has sliced up original bound copies of Hansard to obtain the pages for scanning – adding that such books are commonly available, as many libraries are selling them.

"For me, it symbolised opening up the data," he told Kable's Electronic Document and Records Management conference.

Wood said the main aim was to avoid expensive conservation work on printed versions of Hansard used by Parliament's members and staff, but also to allow better searching and reduce storage costs.

The process compares the results of three OCR scans with 100 per cent of the results proof read by a contractor. Parliament also proof reads one per cent to check the quality of the work. Wood said although the likes of Google and Microsoft have digitised some of Hansard as part of other projects, their work "is not particularly good, on the whole – there's very little metadata".

Robert Brook, a developer working on the project, said the system aims to provide excellent metadata, with material linked by bill, MP, constituency and even monarch. "Previously, we've treated the web as somewhere data goes to die," he said, but the aim of this project is to open it to numerous uses.

This has been evident in the eclecticism of searches made by users so far, Brook added. "I expected them to look for Tony Blair and Iraq," he said, but instead popular searches have included Telic, the code name for Britain's operations in Iraq, asbestos use in playgrounds, and Corsham's military communications centre.

Around 95 per cent of searches come through Google. "No one uses our search engine, which is really galling," said Brook. But he added that this means people are finding the nascent system as part of general search, rather than specifically looking for Hansard.

Brook said the system, which relies entirely on open source software and uses open data standards to allow reuse and mash-ups on other websites, will add another decade's worth of material in the next month. If it wins approval, it will eventually "get a portcullis on top", he said, and be adopted as an official archive of Hansard.

This article was originally published at Kablenet.

Kablenet's GC weekly is a free email newsletter covering the latest news and analysis of public sector technology. To register click here.

SANS - Survey on application security programs

More from The Register

next story
Did a date calculation bug just cost hard-up Co-op Bank £110m?
And just when Brit banking org needs £400m to stay afloat
One year on: diplomatic fail as Chinese APT gangs get back to work
Mandiant says past 12 months shows Beijing won't call off its hackers
Whoever you vote for, Google gets in
Report uncovers giant octopus squid of lobbying influence
Lavabit loses contempt of court appeal over protecting Snowden, customers
Judges rule complaints about government power are too little, too late
MtGox chief Karpelès refuses to come to US for g-men's grilling
Bitcoin baron says he needs another lawyer for FinCEN chat
Don't let no-hire pact suit witnesses call Steve Jobs a bullyboy, plead Apple and Google
'Irrelevant' character evidence should be excluded – lawyers
EFF: Feds plan to put 52 MILLION FACES into recognition database
System would identify faces as part of biometrics collection
Putin tells Snowden: Russia conducts no US-style mass surveillance
Gov't is too broke for that, Russian prez says
Ex-Tony Blair adviser is new top boss at UK spy-hive GCHQ
Robert Hannigan to replace Sir Iain Lobban in the autumn
Alphadex fires back at British Gas with overcharging allegation
Brit colo outfit says it paid for 347KVA, has been charged for 1940KVA
prev story

Whitepapers

Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.