Related topics
  • ,
  • ,
  • ,

Intel demos next-generation voice and gesture interfaces

Offers a million bucks for the best 'perceptual computing' idea

IDF 2012 Intel wants computers to be as smart as humans in how they understand voices and gestures – and it's offering $1m to the best idea that can help achieve that goal.

"Human beings are very rich in the way that they interface with each other, the way they interact with each other," David Perlmutter, the general manager of Intel's Architecture Group told his audience at his opening keynote of the Intel Developer Forum on Tuesday in San Francisco.

A principle part of that interaction, of course, is voice, and Perlmutter introduced a demo of a Dell XPS 13 Ultrabook running a beta version of Dragon Assistant by Nuance, which he said would be released in a public beta in the fourth quarter of this year and in production by the first quarter of 2013.

In the demo, Dragon Assistant responded to simple voice requests, such as being asked to search for pictures of San Francisco on Google, as well as performing more-complex activities such as looking for sunglasses on Amazon then sharing the link of the Amazon results page, along with a voice-to-text Twitter message asking followers for suggestions.

In each case, Dragon Assistant's female voice identified – Siri-like – the search results. Interestingly, the demo showed how the voice-recognition technology could also correctly understand poor grammar and relatively poor pronunciation of foreign song titles when asked to play specific tunes.

But people don't just communicate by voice, Perlmutter said. "They don't just use voice, they don't just use handwriting, they don't just use touch," he said. "They use gestures: handshake gestures, hand gestures, finger gestures."

Gestural interaction was demoed using a compact USB-powered Creative 3D camera coupled with SoftKinetic's 3D gesture-recognition middleware. In its first iteration, Perlmutter said, the camera will be a separate unit mounted on top of, say, a laptop display, but that 3D-recognition cameras of the future will be integrated into laptops – and, one assumes, Ultrabooks.

Intel's David Perlmutter demonstrating gestural recognition at the Intel Developers Forum

Perlmutter: Each finger has a role in gestural recognition

The demo showed that the camera and software has the capability of recognizing not only large gestures, but finger gestures as well. Such recognition capabilities, Perlmutter said, were "just the beginning" of gestural interaction.

At Intel Labs, he said, there's work being done on the ability to play virtual catch with virtual objects. "I thought about what will happen if I had all these virtual objects," Perlmutter said, "and I can have a discussion with Skype or whatever other video-conferencing capability with my granddaughter – I will be able to play with her across the ocean."

To grease the skids of what the company calls "perceptual computing" – touch, voice, fine-grained gesture recognition, facial and object recognition, and other modes of human-computing interaction – Intel will soon make available a perceptual computing SDK for use with Creative's Interactive Gesture Camera Developer Kit, including a 3D HD camera that has the ability to interpret gestures between roughly 6 inches and 3 feet.

In addition, Intel will host a Perceptural Computing Challenge with a prize of $1m in awards and promotions to the best submission. The contest will go live in the fourth quarter of this year, and will debut on Intel's Perceptual Computing website.

Perlmutter also offered an idea of what he hopes will come after perceptual computing. In addition to being able to understand your gestures, computers should also be able to figure out what you want without you having to be thoroughly specific in your requests, he said. "I call it my 'wife dream'. I'll figure out and guess what she wants and be ready to go when she just wants it."

Computers should be able to do the same, he said. But first they need to respond to your touch, do what you tell them to with your voice, and respond to your gestures, whether those gestures be as broad as those made by your entire body, or as subtle as a wiggle of your finger. ®

Sponsored: How to determine if cloud backup is right for your servers