Dismayed by woeful AI chatbots, boffins hired real people – and went back to square one

Amazon Turk serfs have their own problems

Chorus chatbot diagram

Analysis Convinced that intelligent conversational assistants like Amazon Alexa, Microsoft Cortana, and Apple Siri are neither particularly intelligent nor capable of sophisticated conversation, computer boffins last year began testing a crowd-powered assistant embodied by Amazon Mechanical Turk workers.

The chatbot, a people-powered app called Chorus, proved better at conversation than software-based advisors, but hasn't managed to overcome poor human behavior.

Described in a recently published research paper, Chorus was developed by Ting-Hao (Kenneth) Huang and Jeffrey P. Bigham of Carnegie Mellon University, Walter S. Lasecki of the University of Michigan, and Amos Azaria of Ariel University.

The researchers undertook the project because chatbots are just shy of worthless, a sorry state of affairs made evident by the proliferation of labelled buttons in chatbot interfaces. It was hoped by businesses the world over that conversational software could replace face-to-face reps and people in call centers, as the machines should be far cheaper and easier to run.

The problem is simply that natural language processing in software is not very good at the moment.

"Due to the lack of fully automated methods for handling the complexity of natural language and user intent, these services are largely limited to answering a small set of common queries involving topics like weather forecasts, driving directions, finding restaurants, and similar requests," the paper explains.

Jeff Bigham, associate professor at Carnegie Mellon's Human-Computer Interaction Institute, in a phone interview with The Register, said, "Today, if you look at what's out there, like Siri, they do a pretty good job using specific speech commands. But if you want to talk about anything you want, they all fail badly."

Bigham and his colleagues devised a system that connects Google Hangouts, through a third-party framework called Hangoutsbot, with the Chorus web server, which routes queries to on-demand workers participating in Amazon Mechanical Turk.

Chorus is not the first project to incorporate a living backend, the research paper acknowledges, pointing to projects like VizWiz, which crowdsources help for the blind. Its aim is to explore the challenges of deploying a crowd-based system and to suggest future avenues of research for improving conversational software.

Real people, it turns out, are fairly adept at extemporaneous conversation, even if they're basically meat-to-metal bridges for Google Search queries in Chorus.

During the test period last year, 59 people participated in 320 conversations, which lasted more than 10 minutes and involved more than 25 messages on average. A lengthy sample exchange presented in the paper details a conversation about the number of suitcases a person can take on a plane from the US to Israel. It reads like a call center transcript.

The average cost of each HIT – Amazon Mechanical Turk terminology for a task – came to $5.05. The average cost per day was $28.90 total.

So far so good. But while people may have an edge with words, they bring with them their own set of problems.

Next page: Time out

Biting the hand that feeds IT © 1998–2017