Googlebooks crusade captures CAPTCHA king
Fights spam. Pumps OCR
Google has acquired reCAPTCHA, a free CAPTCHA service that also serves as a means of digitizing printed books and newspapers. Among other things, the Mountain View web giant is looking to juice its ever-controversial library-scanning Book Search project.
Google announced the acquisition this morning with a post to the Official Google Blog, and it couldn't help but trumpet the news with, yes, a CAPTCHA:
"The image above is a CAPTCHA — you can read it, but computers have a harder time interpreting the letters. We tried to make it hard for computers to recognize because we wanted to give humans the scoop first, but we're happy to announce to everybody now that Google has acquired reCAPTCHA, a company that provides CAPTCHAs to help protect more than 100,000 websites from spam and fraud," the post reads.
But its not just spam and fraud protection that interests the Mountain View Chocolate Factory. ReCAPTCHA is also a way for Google to improve the OCR (optical character recognition) technology it uses to digitize printed materials for both its Book Search and News Archive Search services.
In providing websites with CAPTCHAs - visual Turing tests that separate humans from machines - reCAPTCHA often includes text scanned from books and newspapers that can't be read with OCR. It pairs this unknown text with a recognized word or phrase. Website visitors are asked to read both words, and if they get the known word correct, ReCaptchas can assume they also read the unknown text correctly.
ReCAPTCHA - a Pittsburgh, Pennsylvania-based outfit that spun off from research originated at Carnegie Mellon University - is currently helping the New York Times to digitize its archive.
Luis von Ahn, the reCAPTCHA founder who co-authored Google's blog post, is one of the Carnegie Mellon researchers who coined the term CAPTCHA, short for Completely Automated Public Turing test to tell Computers and Humans Apart. ReCAPTCHAs first hit the web in 2007, and Ahn founded the company in 2008. The Carnege Mellon assistant computer science professor has not responded to our request for comment.
"Google is the best fit for reCAPTCHA," reads a canned statement from von Ahn tucked into a press release. "From the very start, people often assumed the project was connected to Google, so it only makes sense that reCAPTCHA Inc. ultimately would find a home within Google."
Von Ahn will remain on the Carnegie Mellon computer science faculty, but he will also work at Google's Pittsburgh engineering office, which is on the university's campus. In the press release, he indicated that reCAPTCHA aleady has close ties with Google. In 2006, the company licensed an Ahn-developed game for use in its Google Image Labeler. Terms of Google's acquisiton were not disclosed. ®
Sponsored: Fast data protection ROI?