Feeds

Googlers devise DeViSE: A thing-recognising FRANKENBRAIN

Machine-learning tech glues together image eyeballing and text grokking

The smart choice: opportunity from uncertainty

You'd think guesswork and advanced science would be natural enemies, but not at Google where a crack team of researchers are trying to mate the two together.

In a paper presented on Monday at an artificial-intelligence conference in California, seven Google researchers outlined their image classifier, software that labels pictures by identifying what's in them. It was created by fusing two distinct machine-learning approaches together.

In short, the system can make an educated guess at identifying an unfamiliar picture based on the text labels offered to it. For example, if it was shown a photo of a black Victorian top hat it hadn't seen before, and asked if it was a black Victorian top hat or a black pedal-opened wastepaper bin – both labels it also hadn't heard of before – it could guess correctly because it knows what various other hats and garbage bins look like and knows the relationships between their labels.

The DeViSE: A Deep Visual-Semantic Embedding Model paper [PDF] describes a tech that strives to combine the eerie image recognition capabilities of Google's traditional weak-AI systems with the broad semantic modeling capabilities of its "Skip-gram" text classifiers.

This approach is called "zero-shot learning", and is seen by the Google brain trust (which includes MapReduce-creator Jeff Dean) as one of the best chances of designing systems that can deal with changeable datasets with poor classifications – in other words, the info Google's growing fleet of handheld or wheel-bound electronic eyes are likely to slurp up from the world around them.

"The goals of this work are to develop a vision model that makes semantically relevant predictions even when it makes errors and generalizes to classes outside of its labeled training set," they write.

DeViSE contains two elements: a text classifier that labels text based on its contents, and an object recognizer that studies images.

The text classifier trains a neural language model using 5.7 million documents comprising 5.4 billion words slurped from Wikipedia. The approach lets the tech convert the fuzzy world of language into a numeric graph in which each word is defined by its relationships with others.

The image recognizer, meanwhile, is a "state-of-the-art deep neural network for visual object recognition" that was trained to recognize some 1,000 categories of images.

Armed with these two power technologies, the researchers figured out a way to fuse the two together so that the model could use both approaches when attempting to classify a new image.

This model is marginally more accurate than today's state-of-the-art systems and is inherently more flexible. The researchers hypothesized:

A DeViSE model that was trained on images with labels like "tiger shark", "bull shark", and "blue shark", but never with images labeled simply "shark", would likely have the ability to generalize to this more coarse-grained descriptor because the language model has learned a representation of the general concept of "shark" which is similar to all of the specific sharks. Similarly, if tested on images of highly specific classes which the model happens to have never seen before, for example a photo of an oceanic whitecap shark, and asked whether the correct label is more likely "oceanic whitecap shark" or some other unfamiliar label (say, "nuclear submarine"), our model stands a fighting chance of guessing correctly because the language model ensures that representation of "oceanic whitecap shark" is closer to the representation of sharks the model has seen, while the representation of "nuclear submarine" is closer to those of other sea vessels.

Subsequent experiments detailed in the paper bore out this theory.

Google believes the system has a broad range of applications in some of the search giant's trickiest problem areas.

"We believe that our model's unusual compatibility with larger, less manicured data sets will prove to be a major strength moving forward," the nine wrote. "Though here we trained on a curated academic image dataset, our model's architecture naturally lends itself to being trained on all available images that can be annotated with any text term contained in the (larger) vocabulary. We believe that training massive "open" image datasets of this form will dramatically improve the quality of visual object categorization systems."

And once Google has honed the capabilities of this tech further, it could be used for a multitude of problems, such as distinguishing between categories like dogs, cats, and lawnmowers, and also specific entities, like telling the difference between cars such as a "Honda Civic, Ferrari F355, Tesla Model-S" they note – capabilities that are crucial ingredients for further developments in Google's key business of highly targeted, automated advertising.

As oil is to the plastics industry, data is to Google: it is the fundamental resource on which the company depends, and the more it can refine it, the more money it can make from it. For this reason machine learning and other deep analytical approaches are a priority for Google as the ad-slinger attempts to automate the classification and tagging of an ever-swelling world of digital data, with this system it has devised another approach to let it slurp more cash from the ethereal digital world. ®

Securing Web Applications Made Simple and Scalable

More from The Register

next story
NO MORE ALL CAPS and other pleasures of Visual Studio 14
Unpicking a packed preview that breaks down ASP.NET
Cheer up, Nokia fans. It can start making mobes again in 18 months
The real winner of the Nokia sale is *drumroll* ... Nokia
Mozilla fixes CRITICAL security holes in Firefox, urges v31 upgrade
Misc memory hazards 'could be exploited' - and guess what, one's a Javascript vuln
Put down that Oracle database patch: It could cost $23,000 per CPU
On-by-default INMEMORY tech a boon for developers ... as long as they can afford it
Google shows off new Chrome OS look
Athena springs full-grown from Chromium project's head
Apple: We'll unleash OS X Yosemite beta on the MASSES on 24 July
Starting today, regular fanbois will be guinea pigs, it tells Reg
HIDDEN packet sniffer spy tech in MILLIONS of iPhones, iPads – expert
Don't panic though – Apple's backdoor is not wide open to all, guru tells us
prev story

Whitepapers

Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Seven Steps to Software Security
Seven practical steps you can begin to take today to secure your applications and prevent the damages a successful cyber-attack can cause.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.