SpinVox: The Inside Story
Santa's Little Helpers
In a statement issued on Sunday, SpinVox admits it needs call centres staffed by human agents to transcribe voice messages and has begun to back away from its earlier claims that most of the translation is performed by AI-based machine translation software, without human intervention.
But thanks to company insiders and company filings, The Register has built up a picture of the company that makes believing even SpinVox's revised claims extremely difficult. Sources suggest SpinVox, a privately held company, is employing a far larger number of transcribers than it publicly states, even today. These sources also point to extreme difficulties in maintaining its operations as the company scaled, winning new carrier contracts in new markets. And an investigation into the company's much-vaunted intellectual property holdings indicates that it holds no machine translation patents.
The humans can make themselves felt. In one case, unpaid staff in Pakistan took over the centre and began broadcasting "distress" text messages to SpinVox subscribers in North America.
By insisting that its operation relies primarily on machines, rather than human manpower, SpinVox avoids security issues and can maintain a much higher corporate valuation. Mobile carriers are aware that 'Mechanical Turk' (named after the chess-playing Victorian automaton that concealed a human operator) transcription has high costs, as Vodafone found out with its human-assisted service.
Santa's Little Helpers
SpinVox success hinges on an apparent miracle, one made in defiance of the state-of-the-art in machine translation. It claims to translate voicemail messages with little or no human intervention. This is SpinVox's singular claim to fame, and has made it the darling of the press and investors. SpinVox has won $200m of investment and grown rapidly.
SpinVox executives repeated the claim last week: "the ratio of humans to messages and humans to number of users is very, very low" CEO Christina Domecq insisted, adding that "the majority of calls are fully automated." Messages by UK subscribers are handled in the UK, she added. SpinVox director Matthew Hobbs told Sky -
"We don't actually need to send any messages to human agents... All messages in the first instance will go through our automated voice message conversion system. Only if the system itself is unsure of a particular word or a particular fragment of the message will either a whole or part of the message be sent to an agent for quality control purposes. This in turn is fed back into the system to train it in a live learning mode."
But SpinVox's Sunday post backtracks - pointing to "five world class call centres" and for the first time, the significance of "human agents" - as the transcribers are called.
But former staff in key positions at SpinVox tell a very different story:
SpinVox insiders claim the company employs between 8,000 and 10,000 human agents around the world, and has more than the five transcription centres it says are in use.
"When you join they tell you that the technology server translates 92 per cent of the time. Then you're dragged into the HR room and made to sign an NDA. It's then that they tell you the true story. No more than two per cent of messages are not transcribed by humans. These are very simple messages such as 'Hello John, Call me back'".
SpinVox transcription centres span the globe, and as its business expanded to reposition as a B2B rather than a B2C business - winning carrier contracts with Vodacom, Telstra and Rogers - increasing numbers were employed. In its home country of the UK, SpinVox conspicuously failed to land a carrier contract - but this helped create an image of a plucky outsider relying on brilliant machine translation technology.
In a statement to The Register today, SpinVox conceded that five wasn't the full picture - there are indeed more.
"SpinVox has relationships with five major secure call centre suppliers around the world with some of these suppliers operating multiple call centres, all subject to the same security rules." (our emphasis) The company says 3,000 agents are used, and that "The ratio of agents to active users when SpinVox started was 5000 per million users. It is now 100 agents per million users".
“You're made to sign an NDA. It's then that they tell you the true story”
The job of maintaining the rapidly growing business was entrusted to Trainers. SpinVox employed around a dozen, providing technical and cultural-specific training to new agents. Several centres were in South Africa, but also Mauritius, South America and most famously in SpinVox folklore, Pakistan. For many agents English wasn't their first language.
SpinVox maintains that "the majority of the messages are converted by machine alone", adding that, "the machine seeks assistance when required. This means that any message could require between 0 and 100 per cent assistance from a human agent depending to what extent the VMCS technology has learnt the voice of the person leaving the message. Typically this takes around eight calls from an individual to a SpinVox user to reach a steady state of automation."
SpinVox declined to give a figure. "It is our confidential business formula. It is literally the ratio that any competitor or company wanting to start a business in the potential multi-billion dollar marketplace that SpinVox actually created would love to know so that they could come after us. No business is going to give its competitors a helping hand and SpinVox is no different."
Security concerns raised last week are fully justified, sources say.
According to the company, "agents working in a Live environment have no knowledge of customer, individual, product, market or use."
But according to a Spnivox insider: "There's zero security on these messages. An 18 or 19 year old kid is listening to a voicemail from a husband and wife - exchanging personal and financial information. It's outrageous."
So if miraculous speech transcription isn't in SpinVox's arsenal - then what is?
Spinning SpinVox's "Brain"
SpinVox's human agents use a software application that predicts words as its agents type them in. When the agent performing the transcription types "Hello", it can anticipate that the next most likely words that follow will be "How are you?".
Without this software, SpinVox's human agents couldn't translate the messages "in near-real time", as the company claims.
But SpinVox has consistently sought to blur the precise role and definition of this software. It goes by several names: SpinVox agents know it as "Tenzing", SpinVox's publicity literature refers to something it calls D2, which it says is "The Brain".
"D2’s pretty smart. It’s bound to be, as D2’s a combination of artificial intelligence, voice recognition and natural linguistics," says SpinVox.
Meanwhile the company's IP filings refer to an acronym VMCS, or the Voice Message Conversion System. The company also markets VMCS as a "cloud platform" for transcription.
A poster claiming to be from SpinVox last year described VMCS as "a carrier grade engine capable of converting voice-into-text in four different languages," asserting that, "the important point is that it is the SpinVox VMCS, not humans in the Philippines or anywhere else, that converts voice messages into text."
That's false, say people familiar with VMCS.
But on the basis of Spinvox's patent applications, the software that agents use for transcribing most of SpinVox's messages seems not to be performing machine translation at all, but doing something much more mundane - word prediction, something commonly performed by specialist packages for disabled users, such as Penfrield XL, and even mobile phones.
"If the majority of messages get converted by machine - why do they need world class call centres?" said one former SpinVox employee.
The IP arsenal that doesn't exist
Clear evidence that SpinVox depends largely on humans not some artificial intelligence breakthrough (and knows it) comes from SpinVox itself.
In Spinvox's boilerplate text, attached to every press release, the company claims to have made "significant innovations in voice and network technologies which are protected by over 70 patents worldwide".
Where does the figure of 70 come from? A global search of IP databases reveals just 8 listings. Most are patent applications, which offer the inventor some, but only limited protection. Each cluster contains multiple applications of the same patent to each patent authority, such as the UK IPO, the EPO, and WIPO for example Add them all up, and you get around 70.
SpinVox filings cover a number of innovations, and when read chronologically, tell their own story. The first filing simply describes a human powered call centre, while the most recent draws the "VMCS" as simply a box in the cloud. All of them describe business methods and applications. For example, recent filings describe uses for a translation system - such as speaking blog posts or emails, and ideas for inserting media or advertising information into an SMS text message. All these are business applications or methods.
Quite significantly, something is missing.
"None of the patents I have seen describes speech translation," says Lyndsay Wiliams, head of Girton Labs in Cambridge and former Microsoft Researcher.
In the United States, only one patent has been granted: and it's for a human-powered call centre.
In 2004 co-founder Daniel Doulton filed a patent for a "Method of providing voicemails to a wireless information device". This accurately describes SpinVox's business operation in great detail, insiders confirm.
In the patent description's own words:
"...the operator intelligently transcribes the actual message from the original voice message by entering the corresponding text message (actually a succinct version of the original voice message, not a verbose word-for-word conversion) into the computer to generate a transcribed text message. The transcribed text message is then sent to the wireless information device from the computer. Because human operators are used instead of machine transcription, voicemails are converted accurately, intelligently, appropriately and succinctly into text messages (SMS/MMS)."
The AI magic is merely an attribute. One section of the patent describes "Automated Voice Recognition" which, the application (No. 20060223502) explains,
"is to speed up the processing of inbound voice files and reduce operating costs. The prime function will be to auto-detect spoken phone numbers, and detect language to route audio files to the correct human operator staffed transcription bureau. It will also be used for detecting names and spoken numbers and addresses from the users online phone-book (see below) and commands for VoicemailManager controls."
This undermines the claim that "data is encrypted", as SpinVox has claimed this week. A SpinVox user's privacy depends not on encryption, but on its overseas call centres agents behaving. As the patent notes:
"All transcription employees must have signed a confidentiality agreement before being able to deal with any messages and must not divulge, share, copy, forward or otherwise share any user information."
While agents cannot see the sender id, they can identify the recipient by phone number. SpinVox insisted in a statement that agents are unaware of the recipients phone number unless the person sending the message put it in the recipient's phone number in the body of the message.
"All the operators are still bound to comply with the strict data protection requirements of their contracts," SpinVox told us.
But the agents do not always behave. In one notorious incident famous in SpinVox company folklore, the agents took over the call centre.
What happened in Pakistan?
SpinVox sources describe a company struggling to cope with rapid growth of the company. Allegations of unpaid expenses and unpaid agents abound. In once instance, the factors came to a head, with staff sending an "SOS message" to bemused phone users in North America.
"Dear customer. We are employees of SpinVox. We convert your messages here in Pakistan. Since SpinVox has stopped has paying us. We won’t be able to convert your messages from now onwards. There is no software that converts the messages. We humans do. – powered by SpinVox"
"A couple of girls were out there as trainers," sources told us. "They got them out quick. SpinVox cancelled the contract."
What now for SpinVox?
The SpinVox story is becoming increasingly confused. Domecq claims 3,000 agents are used by the company, while sources put the number much higher, between 8,000 and 10,000. The company stands by the figure of 3,000.
Questions have also been raised about the judgment of investors Goldman Sachs and Ariadne Capital, who between them injected $200m of other people's money into the company.
In an emotional blog posting, Ariadne's Julie Meyer played psychologist - musing about the motives of "a cheerleadership of malcontents" - and also played the gender card. Amongst many things, Meyer wrote that she loved CEO Christina Domecq's "search for excellence, and her driiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiive", and said SpinVox was "the first major technology success story out of Europe founded and led by a woman. You’ve got to love that too. Go girl!"
We asked SpinVox if, with so many unanswered questions, it was wise to go on the offensive. The company said that the company "is under anonymous and malicious attack from a group of disgruntled former employees. After two weeks of investigation the company has identified nine of those who are responsible and has started legal action against them."
Meyer claimed SpinVox had "strong IP", "world class technology" and a "staggering" rate of innovation. Yet in five years, SpinVox has had just one patent approved in the United States: Doulton's application described above, for a human-powered call centre, was finally granted (No. 7,532,913) on May 12.
SpinVox's carrier customers don't make decisions based on sentiment or emotion, but on the assumption that the business can grow. As one insider told us:
"Carriers are going to go, 'Hang on a second!'. They are going to ask to be shown the technology that does this transcription. SpinVox won't be able to do that, because it doesn't exist." ®