Microsoft: Our AI speech recognition mangles your words the least

Cortana may be due an upgrade

Microsoft researchers working on AI computer speech recognition have reached a word error rate of 6.3 per cent, claiming to be the lowest in the industry.

Hot on the heels of Google DeepMind announcing a “breakthrough” in AI speech recognition, Microsoft was quick to respond by saying it, too, has reached a “milestone” while using neural networks.

A paper released on arXiv shows the researchers have combined “neural-network-based acoustic and language modelling” on the US National Institute of Standards and Technology (NIST) 2000 Switchboard task - a conversational telephone speech recognition test used as an industry standard.

Artificial neural networks which are modelled loosely on how scientists believe the brain performs calculations are often used in speech processing.

Microsoft has used a mixture of convolutional neural networks that don’t form a closed cycle of nodes and a recurrent neural network that does. Both are useful for analysing the large data sets used to train the networks to process language, which requires a lot of computing power.

Human language is difficult for computers to understand. To make it easier, models are used to predict the probability distribution over sequences of words using audio signals from speech.

Microsoft uses a 30k-word vocabulary database derived from the most common words in the Switchboard and Fisher corpora to recognise words from speech.

Computer speech recognition has come a long way. Twenty years ago the error rate of the best published research system had a word error rate that was greater than 43 percent, Microsoft said.

Speech recognition is a hot trend in AI as companies race to build the best AI personal assistant. Microsoft has Cortana, Apple uses Siri, Google has its WaveNet system and Amazon has just announced a UK release for its Echo device. ®

Sponsored: The Joy and Pain of Buying IT - Have Your Say


Biting the hand that feeds IT © 1998–2017