Original URL: http://www.theregister.co.uk/2012/09/03/hd_voice_stress/
Dodgy audio connections conceal more than just words
How much will HD Voice tell you?
Researchers at BT, working with UCL, have been looking at voices to see what's stopping the machines from working out how you feel as well as what you're saying.
Today's call centres routinely use stress analysis to see if callers are lying, despite the fact that such systems generally don't work, but up at BT's Adastral Park they're testing high-quality audio connections to see if better audio will give greater insight into what callers are thinking, and perhaps provide a business case for HD audio too.
Fraud identity systems work as primitive lie detectors, the first few questions asked (name, policy number, etc.) are used to calibrate the system which then decides if later questions ("was your car locked when you left it?") are answered truthfully. The idea is top, if you're an insurance company, but the rampant inaccuracy often causes more harm than good, so few companies bother with them.
Some of that will be down to the quality of the phone line, with the rest being the fault of the software, but how much can be attributed to each is what this research project is setting out to establish. The question isn't just about computers recognising emotions, better phone lines might help people communicate too, but proving that is surprisingly complicated.
Fortunately BT has access to "Natural Voice", a half-megabyte phone system capable of conveying every quiver of timbre and nuance of pitch, which provides a baseline against which compression technologies and rates can be tested. The idea is to establish how much compression can be applied without affecting listeners' ability to discern emotion, then working out if such an ability is worth paying for.
The testing involves recordings made by people under stress, induced using headphone-piped motorcycle sounds, those recordings are mixed with unstressed voices and played back with different compression to see if humans, or machines, can empathise with the speaker. Stress was chosen 'cos it's easy to induce, and humans can, it seems, recognise it in most cases with hit rates approaching 80 per cent. Computers can do even better, with an accuracy of well over 90 per cent when working with perfect audio, but early testing seems to confirm that the accuracy tails off as the compression increases.
As is the case with a lot of BT research, the project is funded with an EU grant, but the work is taking place at BT's Adastral site and lead researcher Charles Ray (relation) is on day release from the former telecoms monopoly, but that's hardly surprising given the obvious commercial imperative.
High Quality audio has been possible on both fixed and mobile networks for at least a decade, but users aren't interested - data rates have shot up but the quality of a voice call hasn't improved in decades. If BT can convince businesses that better audio means more empathy then they will be queuing up to buy it, but proving that point will need a decent lump of EU money and at least another year of work. In the meantime you might want to consider the quality of your connection when telling that really big lie. ®