Feeds

Googlers devise DeViSE: A thing-recognising FRANKENBRAIN

Machine-learning tech glues together image eyeballing and text grokking

Combat fraud and increase customer satisfaction

You'd think guesswork and advanced science would be natural enemies, but not at Google where a crack team of researchers are trying to mate the two together.

In a paper presented on Monday at an artificial-intelligence conference in California, seven Google researchers outlined their image classifier, software that labels pictures by identifying what's in them. It was created by fusing two distinct machine-learning approaches together.

In short, the system can make an educated guess at identifying an unfamiliar picture based on the text labels offered to it. For example, if it was shown a photo of a black Victorian top hat it hadn't seen before, and asked if it was a black Victorian top hat or a black pedal-opened wastepaper bin – both labels it also hadn't heard of before – it could guess correctly because it knows what various other hats and garbage bins look like and knows the relationships between their labels.

The DeViSE: A Deep Visual-Semantic Embedding Model paper [PDF] describes a tech that strives to combine the eerie image recognition capabilities of Google's traditional weak-AI systems with the broad semantic modeling capabilities of its "Skip-gram" text classifiers.

This approach is called "zero-shot learning", and is seen by the Google brain trust (which includes MapReduce-creator Jeff Dean) as one of the best chances of designing systems that can deal with changeable datasets with poor classifications – in other words, the info Google's growing fleet of handheld or wheel-bound electronic eyes are likely to slurp up from the world around them.

"The goals of this work are to develop a vision model that makes semantically relevant predictions even when it makes errors and generalizes to classes outside of its labeled training set," they write.

DeViSE contains two elements: a text classifier that labels text based on its contents, and an object recognizer that studies images.

The text classifier trains a neural language model using 5.7 million documents comprising 5.4 billion words slurped from Wikipedia. The approach lets the tech convert the fuzzy world of language into a numeric graph in which each word is defined by its relationships with others.

The image recognizer, meanwhile, is a "state-of-the-art deep neural network for visual object recognition" that was trained to recognize some 1,000 categories of images.

Armed with these two power technologies, the researchers figured out a way to fuse the two together so that the model could use both approaches when attempting to classify a new image.

This model is marginally more accurate than today's state-of-the-art systems and is inherently more flexible. The researchers hypothesized:

A DeViSE model that was trained on images with labels like "tiger shark", "bull shark", and "blue shark", but never with images labeled simply "shark", would likely have the ability to generalize to this more coarse-grained descriptor because the language model has learned a representation of the general concept of "shark" which is similar to all of the specific sharks. Similarly, if tested on images of highly specific classes which the model happens to have never seen before, for example a photo of an oceanic whitecap shark, and asked whether the correct label is more likely "oceanic whitecap shark" or some other unfamiliar label (say, "nuclear submarine"), our model stands a fighting chance of guessing correctly because the language model ensures that representation of "oceanic whitecap shark" is closer to the representation of sharks the model has seen, while the representation of "nuclear submarine" is closer to those of other sea vessels.

Subsequent experiments detailed in the paper bore out this theory.

Google believes the system has a broad range of applications in some of the search giant's trickiest problem areas.

"We believe that our model's unusual compatibility with larger, less manicured data sets will prove to be a major strength moving forward," the nine wrote. "Though here we trained on a curated academic image dataset, our model's architecture naturally lends itself to being trained on all available images that can be annotated with any text term contained in the (larger) vocabulary. We believe that training massive "open" image datasets of this form will dramatically improve the quality of visual object categorization systems."

And once Google has honed the capabilities of this tech further, it could be used for a multitude of problems, such as distinguishing between categories like dogs, cats, and lawnmowers, and also specific entities, like telling the difference between cars such as a "Honda Civic, Ferrari F355, Tesla Model-S" they note – capabilities that are crucial ingredients for further developments in Google's key business of highly targeted, automated advertising.

As oil is to the plastics industry, data is to Google: it is the fundamental resource on which the company depends, and the more it can refine it, the more money it can make from it. For this reason machine learning and other deep analytical approaches are a priority for Google as the ad-slinger attempts to automate the classification and tagging of an ever-swelling world of digital data, with this system it has devised another approach to let it slurp more cash from the ethereal digital world. ®

SANS - Survey on application security programs

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Oh no, Joe: WinPhone users already griping over 8.1 mega-update
Hang on. Which bit of Developer Preview don't you understand?
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
Internet-of-stuff startup dumps NoSQL for ... SQL?
NoSQL taste great at first but lacks proper nutrients, says startup cloud whiz
IRS boss on XP migration: 'Classic fix the airplane while you're flying it attempt'
Plus: Condoleezza Rice at Dropbox 'maybe she can find ... weapons of mass destruction'
Ditch the sync, paddle in the Streem: Upstart offers syncless sharing
Upload, delete and carry on sharing afterwards?
New Facebook phone app allows you to stalk your mates
Nearby Friends feature goes live in a few weeks
Microsoft TIER SMEAR changes app prices whether devs ask or not
Some go up, some go down, Redmond goes silent
prev story

Whitepapers

Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.