Feeds

Googlers devise DeViSE: A thing-recognising FRANKENBRAIN

Machine-learning tech glues together image eyeballing and text grokking

The Power of One Brief: Top reasons to choose HP BladeSystem

You'd think guesswork and advanced science would be natural enemies, but not at Google where a crack team of researchers are trying to mate the two together.

In a paper presented on Monday at an artificial-intelligence conference in California, seven Google researchers outlined their image classifier, software that labels pictures by identifying what's in them. It was created by fusing two distinct machine-learning approaches together.

In short, the system can make an educated guess at identifying an unfamiliar picture based on the text labels offered to it. For example, if it was shown a photo of a black Victorian top hat it hadn't seen before, and asked if it was a black Victorian top hat or a black pedal-opened wastepaper bin – both labels it also hadn't heard of before – it could guess correctly because it knows what various other hats and garbage bins look like and knows the relationships between their labels.

The DeViSE: A Deep Visual-Semantic Embedding Model paper [PDF] describes a tech that strives to combine the eerie image recognition capabilities of Google's traditional weak-AI systems with the broad semantic modeling capabilities of its "Skip-gram" text classifiers.

This approach is called "zero-shot learning", and is seen by the Google brain trust (which includes MapReduce-creator Jeff Dean) as one of the best chances of designing systems that can deal with changeable datasets with poor classifications – in other words, the info Google's growing fleet of handheld or wheel-bound electronic eyes are likely to slurp up from the world around them.

"The goals of this work are to develop a vision model that makes semantically relevant predictions even when it makes errors and generalizes to classes outside of its labeled training set," they write.

DeViSE contains two elements: a text classifier that labels text based on its contents, and an object recognizer that studies images.

The text classifier trains a neural language model using 5.7 million documents comprising 5.4 billion words slurped from Wikipedia. The approach lets the tech convert the fuzzy world of language into a numeric graph in which each word is defined by its relationships with others.

The image recognizer, meanwhile, is a "state-of-the-art deep neural network for visual object recognition" that was trained to recognize some 1,000 categories of images.

Armed with these two power technologies, the researchers figured out a way to fuse the two together so that the model could use both approaches when attempting to classify a new image.

This model is marginally more accurate than today's state-of-the-art systems and is inherently more flexible. The researchers hypothesized:

A DeViSE model that was trained on images with labels like "tiger shark", "bull shark", and "blue shark", but never with images labeled simply "shark", would likely have the ability to generalize to this more coarse-grained descriptor because the language model has learned a representation of the general concept of "shark" which is similar to all of the specific sharks. Similarly, if tested on images of highly specific classes which the model happens to have never seen before, for example a photo of an oceanic whitecap shark, and asked whether the correct label is more likely "oceanic whitecap shark" or some other unfamiliar label (say, "nuclear submarine"), our model stands a fighting chance of guessing correctly because the language model ensures that representation of "oceanic whitecap shark" is closer to the representation of sharks the model has seen, while the representation of "nuclear submarine" is closer to those of other sea vessels.

Subsequent experiments detailed in the paper bore out this theory.

Google believes the system has a broad range of applications in some of the search giant's trickiest problem areas.

"We believe that our model's unusual compatibility with larger, less manicured data sets will prove to be a major strength moving forward," the nine wrote. "Though here we trained on a curated academic image dataset, our model's architecture naturally lends itself to being trained on all available images that can be annotated with any text term contained in the (larger) vocabulary. We believe that training massive "open" image datasets of this form will dramatically improve the quality of visual object categorization systems."

And once Google has honed the capabilities of this tech further, it could be used for a multitude of problems, such as distinguishing between categories like dogs, cats, and lawnmowers, and also specific entities, like telling the difference between cars such as a "Honda Civic, Ferrari F355, Tesla Model-S" they note – capabilities that are crucial ingredients for further developments in Google's key business of highly targeted, automated advertising.

As oil is to the plastics industry, data is to Google: it is the fundamental resource on which the company depends, and the more it can refine it, the more money it can make from it. For this reason machine learning and other deep analytical approaches are a priority for Google as the ad-slinger attempts to automate the classification and tagging of an ever-swelling world of digital data, with this system it has devised another approach to let it slurp more cash from the ethereal digital world. ®

The Essential Guide to IT Transformation

More from The Register

next story
KDE releases ice-cream coloured Plasma 5 just in time for summer
Melty but refreshing - popular rival to Mint's Cinnamon's still a work in progress
NO MORE ALL CAPS and other pleasures of Visual Studio 14
Unpicking a packed preview that breaks down ASP.NET
Secure microkernel that uses maths to be 'bug free' goes open source
Hacker-repelling, drone-protecting code will soon be yours to tweak as you see fit
Cheer up, Nokia fans. It can start making mobes again in 18 months
The real winner of the Nokia sale is *drumroll* ... Nokia
Put down that Oracle database patch: It could cost $23,000 per CPU
On-by-default INMEMORY tech a boon for developers ... as long as they can afford it
Another day, another Firefox: Version 31 is upon us ALREADY
Web devs, Mozilla really wants you to like this one
Google shows off new Chrome OS look
Athena springs full-grown from Chromium project's head
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Application security programs and practises
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Securing Web Applications Made Simple and Scalable
Learn how automated security testing can provide a simple and scalable way to protect your web applications.