Feeds

IBM's megabrain Watson to make mobe, slab apps smarter? Not so fast

We drill into Big Blue's dream to put data-muncher monster in your palm

Beginner's guide to SSL certificates

Analysis IBM wants developers to build smartphone apps that use Big Blue's clever Jeopardy!-beating Watson software.

But harnessing the TV star's silicon brain will require more than just invoking a few API calls with JSON: the app programmers will have to do a lot heavy lifting themselves to train Watson.

The Watson Mobile Developer Challenge was fluffed by IBM in a press release on Wednesday and during a speech by chief executive Virginia Rometty at Mobile World Congress.

It gives programmers a chance to compete over the next three months to come up with ideas for "cognitive computing" applications that use Watson's capabilities, and successful ones will be paired with IBM's Interactive Experience Group to help them develop a viable commercial product.

For now, we'll put aside the fact that most other app competitions involve the winner getting cold hard cash – Salesforce shelled out $1 million, for instance. Instead, we'll delve into some of the technical issues that will make developing for Watson a new and sometimes frustrating experience.

IBM claims that Watson "processes information akin to how people think." This is half right – Watson constructs an internal model from the data you throw at it to understand, but training Watson to deal with that information takes a long time, and it is still quite brittle.

Watson's fundamental technology is a decision engine that is able to analyze and answer questions about data loaded into it, such as what symptoms may be indicative of certain cancers, or the correct financial product to recommend to someone given their situation. It is an immensely powerful technology and represents years of research by IBM and academics.

The catch, as highlighted by El Reg, is that this approach requires developers loading a large amount of data into Watson's underlying Hadoop and Apache UIMA-based "DeepQA" analysis engine. Watson then needs to be trained on the data to allow it to develop an appropriate mental model of the information. This takes time, and limits the range of apps that can be built on the system.

"The way that training occurs is through an iterative process, very much like school," explained IBM Watson veep Stephen Gold, in a chat with El Reg.

"How long it takes is a byproduct of the actual use case. If I'm teaching basic arithmetic, the process moves very quickly and I can answer questions in a very short period of time. If I want the system to be able to perform advanced differential equations, I know I need to build through an advanced set of learnings."

'We don't have a lot of partners who want to boil the ocean'

Although new datasets take a while to integrate, once this is complete, related material can be added in a shorter timeframe: when Watson was first put to work analyzing cancer, it took a year to fully integrate information involving lung cancer, but then only took six months to add breast cancer, and three months to add in colon cancer, Gold explained.

So, when IBM said it hopes developers will build apps for Watson, it's worth pointing out that if IBM hasn't stored the exact data the developers would like to drill into, the developers will need to work with IBM to get that knowledge into Watson.

This could take "between weeks and months," Gold said, before pointing out that "most of the apps we see are not nearly as complex as cancer. Most of them have a finite information they're working with [such as] product manuals."

In the short term, Watson will likely be an amazing technology for apps that require a decision-making capability, but we're a long, long way from the types of general intelligence models that would turn Watson from a sophisticated Fabergé egg into a tech of broad utility.

"To get to a general application, what you'd have to do is have enough experience and time to train Watson on all things possible," Gold explained. "What we find with applications, they are very purposeful in the [particular] problem they are trying to solve. We don't have a lot of partners who want to boil the ocean."

Watson will "continue to get smarter," he said, but as IBM tries to integrate more and more data into a single model it will start to run into a problem that even human adults have trouble solving – dealing with contradicting data.

"It's not so much about the volume [of data]," Gold explained. "If the veracity is high of that information and is uncontroverted, it could be a terabyte and you could train Watson quickly, but where the veracity is in question or sources of evidence are contradictory, Watson needs to iterate through not only the training but also the use... there's a lot more to do not so much with the number of sources of data, it's how much the data that's being collected is conflicted with itself."

By example, if Watson had been fed a full dataset from the 1400s that stated unequivocally the world was flat, it would take it some time to adjust to new data coming in that stated the world was round, but adjust it would. This is fundamentally different to how current computers work and is a laudable, fascinating bit of technology. It is not, however, easy or simple or trivial, so developers keen to develop Watson apps will need to be very specific at first about the types of data they want to draw on. We wish them the best of luck in grappling with Big Blue's Big Brain. ®

Internet Security Threat Report 2014

More from The Register

next story
The cloud that goes puff: Seagate Central home NAS woes
4TB of home storage is great, until you wake up to a dead device
Azure TITSUP caused by INFINITE LOOP
Fat fingered geo-block kept Aussies in the dark
You think the CLOUD's insecure? It's BETTER than UK.GOV's DATA CENTRES
We don't even know where some of them ARE – Maude
Intel offers ingenious piece of 10TB 3D NAND chippery
The race for next generation flash capacity now on
Want to STUFF Facebook with blatant ADVERTISING? Fine! But you must PAY
Pony up or push off, Zuck tells social marketeers
Oi, Europe! Tell US feds to GTFO of our servers, say Microsoft and pals
By writing a really angry letter about how it's harming our cloud business, ta
SAVE ME, NASA system builder, from my DEAD WORKSTATION
Anal-retentive hardware nerd in paws-on workstation crisis
prev story

Whitepapers

Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
The Heartbleed Bug: how to protect your business with Symantec
What happens when the next Heartbleed (or worse) comes along, and what can you do to weather another chapter in an all-too-familiar string of debilitating attacks?
Top 5 reasons to deploy VMware with Tegile
Data demand and the rise of virtualization is challenging IT teams to deliver storage performance, scalability and capacity that can keep up, while maximizing efficiency.