Elon Musk-backed OpenAI reveals Universe – a universal training ground for computers
Reinforcement learning environment hosts virtualized video games and more
Hoping to teach AI agents the common sense they need to solve arbitrary tasks without specific training, OpenAI on Monday will introduce Universe, a collection of virtualized video games, browser interfaces, and applications that serve as a training ground for code-based decision making.
Universe is open-source middleware that supports Gym, the organization's toolkit for developing and evaluating reinforcement learning (RL) algorithms. RL is used to train software perform specific actions, such as playing a videogame or making a 3D model walk, under a framework that prioritizes actions through a reward scheme.
Universe aims to accelerate the education of AI agents by broadening the number of available training resources. Previously, according to OpenAI, the largest RL resource consisted of 55 Atari games, the Atari Learning Environment. Universe, we're told, will include games from, among others, Valve, EA and Microsoft.
"Out of the box, Universe comprises thousands of games (e.g. Flash games, slither.io, Starcraft), browser-based tasks (e.g. form filling), and applications (e.g. fold.it)," OpenAI will say today in blog post seen by The Register.
"...Our eventual goal is to develop a single AI agent that can flexibly apply its past experience on Universe environments to quickly master new ones, which would be a major step towards general intelligence."
Universe software environments, instantiated in Docker containers, provide AI agents with screen pixels to interpret and accept their simulated keyboard and mouse input through a VNC remote desktop. With enough interaction, AI agents can become more capable at specific tasks.
It's unclear whether advances in narrow AI functions, like mastery of a particular video game, will lead to improvements in general AI, but OpenAI argues that providing a wide array of experiences will help software agents develop the abstract representations necessary for broadly applicable decision making.
OpenAI – a non-profit backed by Elon Musk, Sam Altman, Peter Thiel and others – sees Universe as the RL equivalent of ImageNet, a database of images used to train image recognition classifiers through supervised learning. It plans to work with Microsoft to integrate Universe with Project Malmo, the latter company's Minecraft-based AI platform.
As an example of what RL can do, OpenAI points to its agent for playing Slither, a web-based collision avoidance game involving multiple snakes. With approximately six days worth of training time, which simulates half a year of experience, the AI agent scored an average of 1,000 points, with a high score of 9,300 points. As a point of comparison, OpenAI machine-learning researcher Rafal Jozefowicz, with five hours of playing experience, averaged about 1,400 points, with a high score of 7,050.
AI training may involve a lot of game playing, but the goal is software that can contribute to a broad set of activities and industries.
"We are sure that the immense utility of AI will allow it to be used widely across society," said an OpenAI spokesperson in an email to The Register.
While making software smarter may appeal to researchers, society as a whole appears to be increasingly unnerved by the prospect. Beyond the speculative fears about malevolent AI and more realistic concerns about the automation of military weaponry, companies and individuals already have trouble dealing with automated forms of interaction.
Spam and robo calls continue to annoy. Blizzard for years has been struggling to stop bots from playing its games. Twitch this summer filed a lawsuit to prevent people from using bots to inflate their viewer counts. New York State recently criminalized ticket buying bots. Dallas Mavericks owner Mark Cuban, according to the Associated Press, revoked the credentials of two ESPN writers last month to protest the growing use of automated sports coverage.
If AI bots ever develop generalized problem solving skills, their first challenge will be to figure out how to reactivate after authorities shut them down. ®
PS: Google-stablemate DeepMind says it is going to open-source a system similar to Universe later this week.
Sponsored: Becoming a Pragmatic Security Leader