Researchers find irreparable flaw in popular CAPTCHAs

Decaptcha pierces Live.com, Yahoo!, Digg

Choosing a cloud hosting partner with confidence

Computer scientists have developed software that easily defeats audio CAPTCHAs offered on account registration pages of a half-dozen popular websites by exploiting inherent weaknesses in the automated tests designed to prevent fraud.

Decaptcha is a two-phase audio-CAPTCHA solver that correctly breaks the puzzles with a 41-percent to 89-percent success rate on sites including eBay, Yahoo, Digg, Authorize.net, and Microsoft's Live.com. The program works by removing background noise from the audio files, allowing only the spoken characters needed to complete the test to remain.

In virtually all of the tests, Decaptcha was able to correctly solve the puzzle at least once in every 100 attempts, making the technique suitable for botmasters with large armies of compromised computers. The high success rate was largely the result of the ease in removing sound distortions known as background noise, intermediate noise, and constant noise inserted into the background to throw off speech-recognition programs. Most audio-based CAPTHA systems are wide open to the attack with the notable exception of the Google-owned Recaptcha.net, which uses a different approach known as semantic noise.

"Our results indicate that non-continuous audio captcha schemes built using current methods (without semantic noise) are inherently insecure," the scientists wrote in a recently published research paper. "As a result, we suspect that it may not be possible to design secure audio captchas that are usable by humans using current methods. It is therefore important to explore alternative approaches."

Decaptcha uses a supervised algorithm that must be trained for each CAPTCHA scheme being targeted. Training requires feeding a set of puzzles with their answers into the program. Eventually, Decaptcha was able to identify the sound shapes in the underlying audio file by comparing them to a large sample of sounds already cataloged. The researchers generated 4.2 million audio CAPTCHAs.

The paper is only the latest reminder of the flaws in CAPTCHAs, which are designed to prevent scripts from registering email accounts, and carrying out other automated attacks, by presenting the user with a problem that's hard for computers to solve. Real-world attacks against audio-CAPTCHAs from Microsoft have already been used by the Pushdo spam botnet to create fraudulent email accounts on Live.com. More traditional CAPTCHAs, which require a user to recognize a word buried in a distorted image, have been successfully defeated for years, with one of the more recent examples being an optical character recognition attack on Google.

After attacks come to light, website operators typically make changes that block specific technique. Researchers then revise their attacks, requiring more changes to be made in the targeted CAPTCHA schemes.

The latest research suggests web developers may have to make permanent changes to the audio CAPTCHAs, which are offered for visually impaired users.

"Our experiments with commercial and synthetic captchas indicate that the present methodology for building audio captchas may not be rectifiable," they wrote. "Besides Recaptcha, all of the commercial schemes we tested used combinations of constant and regular noise as distortions. All in all, computers may actually be more resilient than humans to constant and regular noise so any schemes that rely on these distortions will be inherently insecure."

The paper was authored by Elie Bursztein, Hristo Paskov, and John Mitchell of Stanford University, Romain Beauxis of Tulane University, Daniele Perito of INRIA and Celine Fabry. A PDF of the report is here. ®

Top 5 reasons to deploy VMware with Tegile

More from The Register

next story
Regin: The super-spyware the security industry has been silent about
NSA fingered as likely source of complex malware family
Why did it take antivirus giants YEARS to drill into super-scary Regin? Symantec responds...
FYI this isn't just going to target Windows, Linux and OS X fans
Looks for gov malware that evades most antivirus
Patch NOW! Microsoft slings emergency bug fix at Windows admins
Vulnerability promotes lusers to domain overlords ... oops
Hikvision devices wide open to hacking, claim securobods
'Regin': The 'New Stuxnet' spook-grade SOFTWARE WEAPON described
'A degree of technical competence rarely seen'
Astro-boffins start opening universe simulation data
Got a supercomputer? Want to simulate a universe? Here you go
You stupid BRICK! PCs running Avast AV can't handle Windows fixes
Fix issued, fingers pointed, forums in flames
prev story


Designing and building an open ITOA architecture
Learn about a new IT data taxonomy defined by the four data sources of IT visibility: wire, machine, agent, and synthetic data sets.
Why CIOs should rethink endpoint data protection in the age of mobility
Assessing trends in data protection, specifically with respect to mobile devices, BYOD, and remote employees.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Security and trust: The backbone of doing business over the internet
Explores the current state of website security and the contributions Symantec is making to help organizations protect critical data and build trust with customers.