Feeds

Google plonks reCAPTCHA on Street View, makes users ID your house

Human brains tapped for more accurate maps

Top 5 reasons to deploy VMware with Tegile

Exclusive Google is getting the public to identify house numbers and signs from Street View photos as part of its reCAPTCHA anti-spam technology - and feeding the data into its online mapping service.

Reg reader Jim Allen was first to notice that photos began appearing in the reCAPTCHA tests over the last week or so. These images are obviously numbers on doors, either on apartments or houses, and words from signs. Google has confirmed as much.

reCAPTCHAs collected on 29 and 30 March

The web giant bought the reCAPTCHA system in September 2009 after it was spun off from research on CAPTCHAs* at Carnegie Mellon University.

One long-standing feature of reCAPTCHA is its ability to help decode passages from scanned printed pages: one of the words shown is generated by software and known to reCAPTCHA, while the other image is of an unrecognised word from a difficult scan that a web visitor must identify.

While reCAPTCHA blocks spammers and other miscreants, Google benefits from human-powered optical character recognition.

Many sites including Facebook, TicketMaster and Twitter use Google reCAPTCHA technology as part of their sign-up process. reCAPTCHA has been applied to digitise the archives of The New York Times and tomes for Google Books.

A spokesman from the ad-broking giant confirmed our reader's theory that the images are from Street View photos - which are captured from cameras on roving Google cars and fed into the web giant's online map service. The spokesman said:

We’re currently running an experiment in which characters from Street View images are appearing in CAPTCHAs. We often extract data such as street names and traffic signs from Street View imagery to improve Google Maps with useful information like business addresses and locations.

Based on the data and results of these reCAPTCHA tests, we’ll determine if using imagery might also be an effective way to further refine our tools for fighting machine and bot-related abuse online.

The statement confirms that Google is using the general public to "read" house numbers on Google Street View photos in an extension to its declared purpose of digitising "books, newspapers and old-time radio shows".

Nicholas Johnston, an anti-spam expert at Symantec MessageLabs, said the application of the reCAPTCHA technology to identify building numbers in Street View may impact its effectiveness as an anti-spam tool, though this is far from clear.

"Whether this impacts the security of reCAPTCHA — used on many websites to prevent abuse — depends on exactly how this feature is implemented," Johnston explained.

"reCAPTCHA works by presenting a CAPTCHA image consisting of two words: a known control word and an unknown word. To solve the CAPTCHA, only the control word must be entered correctly. The unknown word gradually gets a higher confidence rating as more users submit the same text for it. reCAPTCHA users don’t know which word is which."

Johnston added: "If we assume Google’s objective is to get reCAPTCHA users to determine building numbers from Street View images, these 'number images' would be the unknown part of the CAPTCHA, so users would not have to solve that part. Instead, they would just have to solve the control word.

"However, if the 'number images' are always used as the unknown part of the CAPTCHA, and as the number images are easily distinguishable from the normal words (normal words have a plain white background, building numbers have a complex, photographic background), it could reduce the complexity of programmatically defeating the CAPTCHA, as just an easy-to-determine part of the CAPTCHA would have to be solved.

"If the 'number images' are always the control part of the CAPTCHA, then it would be easier still to solve the CAPTCHA programmatically, as the numbers (at least in the examples we’ve seen) are shorter and use a much smaller set of possible characters."

Much depends on the implementation of number recognition within reCAPTCHA, Johnston cautioned.

"It’s likely that instead of either only the 'number images' or normal words always being used as the control, sometimes number images are used as the control, and sometimes words are used as the control.

"Without knowing which part of the CAPTCHA is the control, anyone trying to defeat it programmatically would have to guess, and would probably have a pretty low overall CAPTCHA solving rate. This change to reCAPTCHA means that anyone attempting to defeat it programmatically must develop specialist OCR for the 'number images' as well," he said. ®

Bootnote

* CAPTCHAs are a way of proving that a human is visiting a website or online service as opposed to automated software. Two distorted words are shown and punters have to type both of them back into a box to pass through; robots tend to be stumped by this verification process. CAPTCHA is an acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart". The tech is mostly used as a challenge-response test to frustrate the automated sign-up to web mail accounts and similar services.

Internet Security Threat Report 2014

More from The Register

next story
'Kim Kardashian snaps naked selfies with a BLACKBERRY'. *Twitterati gasps*
More alleged private, nude celeb pics appear online
Hackers pop Brazil newspaper to root home routers
Step One: try default passwords. Step Two: Repeat Step One until success
UK.gov lobs another fistful of change at SME infosec nightmares
Senior Lib Dem in 'trying to be relevant' shocker. It's only taxpayers' money, after all
Spies would need SUPER POWERS to tap undersea cables
Why mess with armoured 10kV cables when land-based, and legal, snoop tools are easier?
TOR users become FBI's No.1 hacking target after legal power grab
Be afeared, me hearties, these scoundrels be spying our signals
Snowden, Dotcom, throw bombs into NZ election campaign
Claim of tapped undersea cable refuted by Kiwi PM as Kim claims extradition plot
Freenode IRC users told to change passwords after securo-breach
Miscreants probably got in, you guys know the drill by now
THREE QUARTERS of Android mobes open to web page spy bug
Metasploit module gobbles KitKat SOP slop
BitTorrent's peer-to-peer chat app Bleep goes live as public alpha
A good day for privacy as invisble.im also reveals its approach to untraceable chats
prev story

Whitepapers

Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Beginner's guide to SSL certificates
De-mystify the technology involved and give you the information you need to make the best decision when considering your online security options.