Feeds

Google eyes filing cabinets

Paper files next for the great data hoover

Secure remote control for conventional and virtual desktops

Google has revealed plans to help convert the world's paper filing cabinets, in Tron-like fashion, into mere nodes in the great hive mind.

The firm will be using an optical character recognition program called Tesseract that was found gathering dust in Hewlett Packard's garage.

"In a nutshell, we are all about making information available to users, and when this information is in a paper document, OCR is the process by which we can convert the pages of this document into text that can then be used for indexing," Google uber techie Luc Vincent said on the firm's code blog today.

Once recognised as one of the three most accurate OCRs on the market, Tesseract had been out of action since 1995.

HP decided it was better out than in if it wasn't making any money and punted it to the Information Science Research Institute at the University of Las Vegas to have it restored for an open source release. The uni gave it to Google, where it was quickly assimilated.

The software has some limitations, Vincent said. Comparatively speaking, it's not that accurate any more, it will only read English, does not like multiple columns or fancy layouts, and baulks at greyscale and colour documents. But, he said it was better than any other open source OCR software.

"Google currently "reads" almost every web page in the world. Come help us read all the printed material as well!" the firm said in an advertisement for OCR engineers. ®

Next gen security for virtualised datacentres

More from The Register

next story
Why has the web gone to hell? Market chaos and HUMAN NATURE
Tim Berners-Lee isn't happy, but we should be
Linux turns 23 and Linus Torvalds celebrates as only he can
No, not with swearing, but by controlling the release cycle
Apple promises to lift Curse of the Drained iPhone 5 Battery
Have you tried turning it off and...? Never mind, here's a replacement
Sin COS to tan Windows? Chinese operating system to debut in autumn – report
Development alliance working on desktop, mobe software
Microsoft boots 1,500 dodgy apps from the Windows Store
DEVELOPERS! DEVELOPERS! DEVELOPERS! Naughty, misleading developers!
Eat up Martha! Microsoft slings handwriting recog into OneNote on Android
Freehand input on non-Windows kit for the first time
This is how I set about making a fortune with my own startup
Would you leave your well-paid job to chase your dream?
prev story

Whitepapers

A new approach to endpoint data protection
What is the best way to ensure comprehensive visibility, management, and control of information on both company-owned and employee-owned devices?
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Maximize storage efficiency across the enterprise
The HP StoreOnce backup solution offers highly flexible, centrally managed, and highly efficient data protection for any enterprise.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.