IBM's megabrain Watson to make mobe, slab apps smarter? Not so fast
We drill into Big Blue's dream to put data-muncher monster in your palm
Analysis IBM wants developers to build smartphone apps that use Big Blue's clever Jeopardy!-beating Watson software.
But harnessing the TV star's silicon brain will require more than just invoking a few API calls with JSON: the app programmers will have to do a lot heavy lifting themselves to train Watson.
The Watson Mobile Developer Challenge was fluffed by IBM in a press release on Wednesday and during a speech by chief executive Virginia Rometty at Mobile World Congress.
It gives programmers a chance to compete over the next three months to come up with ideas for "cognitive computing" applications that use Watson's capabilities, and successful ones will be paired with IBM's Interactive Experience Group to help them develop a viable commercial product.
For now, we'll put aside the fact that most other app competitions involve the winner getting cold hard cash – Salesforce shelled out $1 million, for instance. Instead, we'll delve into some of the technical issues that will make developing for Watson a new and sometimes frustrating experience.
IBM claims that Watson "processes information akin to how people think." This is half right – Watson constructs an internal model from the data you throw at it to understand, but training Watson to deal with that information takes a long time, and it is still quite brittle.
Watson's fundamental technology is a decision engine that is able to analyze and answer questions about data loaded into it, such as what symptoms may be indicative of certain cancers, or the correct financial product to recommend to someone given their situation. It is an immensely powerful technology and represents years of research by IBM and academics.
The catch, as highlighted by El Reg, is that this approach requires developers loading a large amount of data into Watson's underlying Hadoop and Apache UIMA-based "DeepQA" analysis engine. Watson then needs to be trained on the data to allow it to develop an appropriate mental model of the information. This takes time, and limits the range of apps that can be built on the system.
"The way that training occurs is through an iterative process, very much like school," explained IBM Watson veep Stephen Gold, in a chat with El Reg.
"How long it takes is a byproduct of the actual use case. If I'm teaching basic arithmetic, the process moves very quickly and I can answer questions in a very short period of time. If I want the system to be able to perform advanced differential equations, I know I need to build through an advanced set of learnings."
'We don't have a lot of partners who want to boil the ocean'
Although new datasets take a while to integrate, once this is complete, related material can be added in a shorter timeframe: when Watson was first put to work analyzing cancer, it took a year to fully integrate information involving lung cancer, but then only took six months to add breast cancer, and three months to add in colon cancer, Gold explained.
So, when IBM said it hopes developers will build apps for Watson, it's worth pointing out that if IBM hasn't stored the exact data the developers would like to drill into, the developers will need to work with IBM to get that knowledge into Watson.
This could take "between weeks and months," Gold said, before pointing out that "most of the apps we see are not nearly as complex as cancer. Most of them have a finite information they're working with [such as] product manuals."
In the short term, Watson will likely be an amazing technology for apps that require a decision-making capability, but we're a long, long way from the types of general intelligence models that would turn Watson from a sophisticated Fabergé egg into a tech of broad utility.
"To get to a general application, what you'd have to do is have enough experience and time to train Watson on all things possible," Gold explained. "What we find with applications, they are very purposeful in the [particular] problem they are trying to solve. We don't have a lot of partners who want to boil the ocean."
Watson will "continue to get smarter," he said, but as IBM tries to integrate more and more data into a single model it will start to run into a problem that even human adults have trouble solving – dealing with contradicting data.
"It's not so much about the volume [of data]," Gold explained. "If the veracity is high of that information and is uncontroverted, it could be a terabyte and you could train Watson quickly, but where the veracity is in question or sources of evidence are contradictory, Watson needs to iterate through not only the training but also the use... there's a lot more to do not so much with the number of sources of data, it's how much the data that's being collected is conflicted with itself."
By example, if Watson had been fed a full dataset from the 1400s that stated unequivocally the world was flat, it would take it some time to adjust to new data coming in that stated the world was round, but adjust it would. This is fundamentally different to how current computers work and is a laudable, fascinating bit of technology. It is not, however, easy or simple or trivial, so developers keen to develop Watson apps will need to be very specific at first about the types of data they want to draw on. We wish them the best of luck in grappling with Big Blue's Big Brain. ®
Sponsored: Hyper-scale data management