DeepMind gets good at games (and choosing them) – plus more bits and bytes from the world of machine learning
Including: AI exercises in Safety Gym
Roundup If you can't get enough of machine learning news then here's a roundup of extra tidbits to keep your addiction ticking away. Read on to learn more about how DeepMind is helping Google's Play Store, and a new virtual environment to train agents safely from OpenAI.
An AI recommendation system for the Google Play Store: Deepmind are helping Android users find new apps in the Google Play Store with the help of machine learning.
“We started collaborating with the Play store to help develop and improve systems that determine the relevance of an app with respect to the user,” the London-based lab said this week.
Engineers built a model known as a candidate generator. Based on a transformer architecture traditionally used to process natural language, the system processes patterns and makes predictions on what app is most likely to be of interest to users by looking at what kind of apps they’ve installed from the Play store before.
The model assigns an “importance weighting” to an app, a value that is calculated by taking into account a user’s previous app downloads and looks at how many times a particular app has been downloaded in the virtual store. The balance allows more niche apps to shine through if they are particularly tailored to a specific use, as well as apps that are actually popular.
All this is fed as input to a reranker system, which learns the “relative importance” between a pair of apps that have been presented to the user to build a recommendation system.
“We know users get the most out of their phone when they have apps and games they love, and that it’s exciting to discover new favourites,” according to a DeepMind statement.
“In collaboration with Google Play, our team that leads on collaborations with Google has driven significant improvements in the Play Store's discovery systems, helping to deliver a more personalised and intuitive Play Store experience for users.”
You can read more about how the candidate generator recommender system works here.
A new algo can play Atari and Go more efficiently than AlphaZero: Here’s more news from DeepMind - researchers have developed a reinforcement learning algorithm capable of learning how to play simple Atari video games as well as the more complicated board games Go, Chess, and Shogi.
Known as MuZero, the algorithm is based on pairing tree search techniques with a model trained on a specific environment or game. The details were described in a technical paper on arXiv this week.
Reinforcement learning agents learn how to complete a specific goal through trial and error. Their actions are guided by virtual rewards that shape its overall strategy or policy. MuZero is essentially a planning algorithm that receives visual input, whether it's an image of a chess board or a still from an Atari video game, and transforms the information into a “hidden state”.
The hidden state constantly changes and is updated based on previous hidden states to predict the next action an agent should take in a game. “At every one of these steps the model predicts the policy (e.g. the move to play), value function (e.g. the predicted winner), and immediate reward (e.g. the points scored by playing a move),” the paper explained.
What’s interesting is that the model seems to achieve state-of-the-art performance across 57 Atari games and matches AlphaZero in playing Go, Chess, and Shogi too. It’s more general than AlphaZero, and doesn’t require explicit knowledge of the rules of the games.
“Crucially, our method does not require any knowledge of the game rules or environment dynamics, potentially paving the way towards the application of powerful learning and planning methods to a host of real world domains for which there exists no perfect simulator,” the paper concluded.
It should be noted, however, that MuZero struggles with some games like Montezuma’s Revenge, Tennis, and Pitfall.
NYC mayor wants an algo management and policy officer to overlook the city’s AI technology: Mayor Bill de Blasio is setting up a new role to ensure algorithms used by the local government are used efficiently, transparently, and that the decisions are accountable.
The new algorithm management and policy officer will be in charge of examining the government’s technology. “The Officer, reporting to the director of Operations, will serve as a centralized resource to help guide the City and its agencies in the development, responsible use, and assessment of algorithmic and related technical tools and systems ("algorithmic tools and systems"), and for engaging and educating the public on issues related to City use of these and other related technologies,” according to an executive order published this week.
That’s all fine and dandy, but if you examine the report further there’s a tiny clause that states that the scrutiny won’t extend to law enforcement, meaning that NYPD will be exempt.
“No information that is required to be disclosed or reported by this Order will be done so in a manner that would violate any applicable provision of federal, state, or local law or that would interfere with a law enforcement investigation or other investigative activity by an agency or would compromise public safety,” the order reads.
Developing AI agents safely in simulation: OpenAI have released a virtual environment dubbed Safety Gym to help developers build machine learning bots that take safety into account.
The idea rests on something called “constrained RL,” where the reinforcement learning algorithms used to train an agent restricts its actions somewhat.
An example OpenAI gives is the training self driving cars. Safety Gym means that a simple task such as getting a car to reach point A to B in the quickest time possible would ensure that the virtual car also takes into account traffic safety to drive safely.
“A big problem with normal RL is that everything about the agent’s eventual behavior is described by the reward function, but reward design is fundamentally hard,” it explained. Poor design can lead to agents finding loop holes where it finds odd tricks to rack up rewards. “In constrained RL, we don’t have to pick trade-offs—instead, we pick outcomes, and let algorithms figure out the trade-offs that get us the outcomes we want,” it added.
Safety Gym involves navigating a robot, car, or dog through a series of messy environments littered with objects that they have to avoid to fulfill simple tasks like pushing a button. Developers can test their RL algorithms in the virtual environment. ®