Feeds

SpinVox: Veni, vidi, descripsi

A report from the technology demo

Next gen security for virtualised datacentres

Yesterday's technology demonstration by SpinVox at its Marlow HQ reminded everyone just how hard it is to do voice to text machine translation, and how far away anyone is from automating the bulk of the voicemail translation in the real world.

All of the messages supplied by our small group of visitors tripped through to a human operator. The event was unnecessary and humiliating for all concerned. SpinVox shouldn't have had to lift its skirts; we didn't need to be there.

It's hard to see else what CIO Rob Wheatley more could have done to explain its Voice Mail Conversion System, or VMCS, other than acknowledge the real human:machine ratio - which no one expected him to do.

As far as we can conclude, yesterday's demos were no fake. We saw that the SpinVox technical architecture is sound, and the experts - in particular, Tony Robinson - are first rate, smart and practical.

But CEO Christina Domecq wants to show it's a different kind of business to the one it really is - a variable cost business heavily dependent on human labour. The demo put paid to the notion that SpinVox has cracked this problem.

The server side automation works like this.

It checks an audio recording of the voicemail for hang ups or duff calls (such as accidents). The server side does indeed contain speech recognition - "how we train it is a secret", said Wheatley. DSP is applied to clean up the audio, if possible.

The server side recognition can - unless what we were watching was all an elaborate hoax - recognize quite a few simple words in a clean room environment with no shouting, slurring, background speech or music. If it fails the machine recognition, it's sent to a QC agent in a call centre. All this has never been in dispute - what is contended is the proportion of messages that require human intervention.

"We've made a hybrid of two qualities," said Wheatley, referring to the machine portion and the human portion. "Human QC agents that are consulted if the confidence score is not sufficiently high."

The QC agent uses a type-ahead program called Tenzing, which is filled with the output of the machine translation attempt. (Tenzing is also called the LQT). It uses a "lattice", or decision matrix, and some knowledge is incorporated. This is the "machine learning" part.

SpinVox remembers "call pairs" so names are remembered on a per-user basis. This raised the privacy issue; SpinVox didn't know how long it kept the audio files, but like Google, said it wants to keep them as long as possible for learning purposes.

How well did Tenzing work? The program is zippy and certainly speeds up typing. As I noted above, all but the first examples tripped through. One short, clear message contained the word Sainsbury's while the rest of the phrase had been pretty accurately translated by the machine.

But in a very rapid message left by one visitor, Milo Yiannopoulos, several passes had to be made - I'd estimate about six or seven passes. I have no doubt from first-hand experience of SpinVox that the message would have been flagged untranslatable - you would have been told to call an number to hear the audio yourself - or (and this happened increasingly frequently) been inserted with so many guesses(?) or ___ blanks that you would have had to call and listen to it anyway.

So we weren't actually looking at technology per se - we were looking at an operational policy that attempts to minimise the time an agent spends trying to decipher the audio. Because time is money. And that's the difference between a good service and a poor one - human skill.

"We're the only people delivering this service at this scale," said the CIO.

I mentioned that Jott was also exploring human/machine hybrids, while PhoneTag (formerly Simulscribe) claimed to be the biggest in the US. Last month Nuance acquired Jott for an undisclosed sum.

Towards the end, after prolonged badgering on the machine:human question, an exasperated Wheatley asked, "Does it matter? What we do is effective, and we've got carriers."

Indeed, if SpinVox had taken this approach from the start - "we use humans, so what?" - we wouldn't all be where we are now.®

The essential guide to IT transformation

More from The Register

next story
UK fuzz want PINCODES on ALL mobile phones
Met Police calls for mandatory passwords on all new mobes
Netflix swallows yet another bitter pill, inks peering deal with TWC
Net neutrality crusader once again pays up for priority access
Fifteen zero days found in hacker router comp romp
Four routers rooted in SOHOpelessly Broken challenge
EE: STILL Blighty's best mobe network, says 'Frappucino' Moore
Fresh round of network stats fisticuffs possibly on the cards here
New Sprint CEO says he will lower axe on staff – but prices come first
'Very disruptive' new rates to be revealed next week
US TV stations bowl sueball directly at FCC's spectrum mega-sale
Broadcasters upset about coverage and cost as they shift up and down the dials
Canadian ISP Shaw falls over with 'routing' sickness
How sure are you of cloud computing now?
UK mobile coverage is BETTER than EVER, networks tell Ofcom
Regulator swallows this line and parrots it back out at us. What are they playing at?
What's the nature of your emergency, Vodafone?
Oh, you've dialled the wrong number for ad fibs, rules ASA
EE network whacked by 'PDP authentication failure' blunder
Carrier is 'aware' of cockup, working on a fix NOW
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 10 endpoint backup mistakes
Avoid the ten endpoint backup mistakes to ensure that your critical corporate data is protected and end user productivity is improved.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.