Feeds

CMU promises to fix speech recognition with a chip

A chip made by a robotic flying car

High performance access to file storage

Hot Chips Speech technology ranks right down there with flying cars, robots and Windows as the grandest of disappointments in geekdom. Thankfully, the horrid state of the technology hasn't broken the will of all researchers in the speech field.

In fact, one team at Carnegie Mellon University optimistically thinks they may have solved the speech recognition conundrum with a new chip.

Armed with a $1m grant from the National Science Foundation, CMU's In Silico Vox team has set the modest goal of showing a 100 to 1,000 times improvement in the performance of speech recognition systems. Such a leap would improve the quality of speech technology to the point where it would feasible to place sophisticated speech engines in devices such as cell phones or PDAs. Rob Rutenbar, a professor at CMU, unveiled the processor that is key to the project's end goal today at the Hot Chips conference.

"It's just a bad idea trying to push this technology in software only," Rutenbar said. "Most of the applications of tomorrow don't want 20 to 30 per cent better performance. They want factors of 100 or factors of 1,000."

Rutenbar likened the move to create a speech chip with the well established practice of creating specialized processors to deal with graphics operations.

"Nobody paints pixels in software," he said. "You would have to be nuts. Videos from ESPN are not painted on your cell phone screen by software. There's a small graphics engine doing that."

Some companies have produced decent speech recognition software for large call centers and automated phone systems. These packages, however, require far more processing power than you're likely to find on smaller computing devices.

The speech systems must compare 50 main sounds used in typical conversation against thousands of permutations on these sounds made when people pronounce words in different ways. The speech engines then run through database of common two- and three-word combinations against a backdrop of some 50,000 different words to come up with strong matches for what a person is actually saying. All told, this process chews through processor, memory and energy resources. That's bad news for a cell phone designer.

The CMU team, however, has already created a lightweight hardware speech engine based on an FPGA (Field Programmable Gate Array) from Xilinx that solves many of these problems. Rutenbar showed the chip in action with it successfully converting the question, "When will Windows arrive?" into text on the screen.

Right now, the processor can only handle about 1,000 words at a modest speed. By the end of the year, CMU hopes to create a larger FPGA system capable of dealing with 5,000 words in real-time. Then, next year it will march to 10,000 and 50,000 words on the FPGA system, while exploring full-fledged silicon designs. Rutenbar said the project could eventually result in a start-up, during an interview with The Register.

Along with the NSF, DARPA and the Department of Homeland Security have put money into the project, which seems to have some possible military uses.

"Homeland security applications are the big reason we were chosen for this award," Rutenbar said. "Imagine if an emergency responder could query a critical online database with voice alone, without returning to a vehicle, in a noisy and dangerous environment. The possibilities are endless."

Government officials might also use such a speech recognition engine to scan phone calls – always a pleasing thought.

So, here's to speech recognition being solved. Now, start hoisting your flying car. ®

High performance access to file storage

More from The Register

next story
Feast your PUNY eyes on highest resolution phone display EVER
Too much pixel dust for your strained eyeballs to handle
Samsung Galaxy S5 fingerprint scanner hacked in just 4 DAYS
Sammy's newbie cooked slower than iPhone, also costs more to build
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Report: Apple seeking to raise iPhone 6 price by a HUNDRED BUCKS
'Well, that 5c experiment didn't go so well – let's try the other direction'
Rounded corners? Pah! Amazon's '3D phone has eye-tracking tech'
Now THAT'S what we call a proper new feature
Zucker punched: Google gobbles Facebook-wooed Titan Aerospace
Up, up and away in my beautiful balloon flying broadband-bot
Nvidia gamers hit trifecta with driver, optimizer, and mobile upgrades
Li'l Shield moves up to Android 4.4.2 KitKat, GameStream comes to notebooks
AMD unveils Godzilla's graphics card – 'the world's fastest, period'
The Radeon R9 295X2: Water-cooled, 5,632 stream processors, 11.5TFLOPS
Sony battery recall as VAIO goes out with a bang, not a whimper
The perils of having Panasonic as a partner
NORKS' own smartmobe pegged as Chinese landfill Android
Fake kit in the hermit kingdom? That's just Kim Jong-un-believable!
prev story

Whitepapers

Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
HP ArcSight ESM solution helps Finansbank
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.