Human StarCraft II e-athletes crushed by neural net ace – DeepMind's AlphaStar
Computers 10 - 1 Humanity
Analysis AlphaStar, DeepMind’s latest AI bot, crushed professional gamers playing popular strategy video game StarCraft II during a demonstration broadcast online Thursday.
DeepMind has had its eyes on the StarCraft II for a while. Back in 2016, it announced a partnership with Starcraft creators Blizzard Entertainment, to work together to crack the game using AI.
There are three playable “races” in the game: Zerg, Protoss, and Terran. Each race comes with different units that have various abilities. Players have to dispatch units to the right locations to perform a series of tasks such as mining resources or building walls to fend off opponent units.
The game requires strategic planning to execute the right sequence of moves quickly. Now, after less than three years since the Blizzard collaboration, the British research lab finally has something to show for it: AlphaStar.
StarCraft and computer science nerds were yesterday treated to a demonstration of AlphaStar in action. First, the bot was up against Dario Wünsch, better known as professional StarCraft player “TLO”. The demo videos were actually replays from when TLO paid a visit to DeepMind’s London office back in December, last year. Although only three out of the five matches were shown, Alphastar thrashed its opponent 5-0.
All games were 1v1 matches, where each side plays as the Protoss race on the Catalyst LE map, a simple layout where all the locations of enemy bases are marked out beforehand. It should also be noted that whilst TLO is a pro player ranked 44th in the world, he doesn’t specialize in the Protoss role. So, he isn’t quite the best player to find AlphaStar’s upper limit in these particular settings.
Step in, Grzegorz Komincz also known as “MaNa”. Ranked 13th in the world, MaNa is a stronger player than his teammate TLO, and specialises playing as the Protoss race. Again, MaNa was pitted against the AI bot over five games in the same previous format. And again, the humans were defeated 5-0 although not all matches were shown.
But, all was not lost for the humans. In the single live match streamed during the broadcast, MaNa managed to convincingly beat AlphaStar despite a strong opening from the agent. The commentators said during its matches with TLO, AlphaStar seemed to play more daringly, carrying out wackier strategies not usually seen in normal human play . When it played MaNa, however, it seemed to settle down into a more conventional style of play and lost.
A league of bots
So, how does AlphaStar work? Well, it’s not completely clear since DeepMind are yet to publish a paper with all the little nitty gritty details. But, here’s what we know so far. AlphaStar is based on an long-short term memory network architecture and is trained using a mixture of supervised learning and reinforcement learning.
It receives a raw interface describing the current state of the game with a list of available units and their properties as input. All this information is processed by the neural network to come up with a list of possible actions, such as choosing where to build a wall. It then chooses the best sequence of moves based on how likely those actions will help it win the game, and executes them in the game as output.
The neural network learned how to play the game by observing real StarCraft II matches. Using imitation learning, it learned to copy the strategies seen in the games to get it to a pretty good level to beat the “Elite” level computer bot in 95 per cent of the matches played.
OpenAI bot bursts into the ring, humiliates top Dota 2 pro gamer in 'scary' one-on-one boutREAD MORE
Next, the system entered a “multi-learning reinforcement learning process,” where several versions of the bot played against each other in a league to rack up experience as quickly as possible.
All the reinforcement learning training took place over 14 days, with each bot racking up to 200 years of gameplay experience - much longer than any lifetime of a human pro. It’s very computationally expensive, each virtual agent required about 16 of Google’s TPU 3.0; that’s the equivalent to a whopping 50 GPUs, according to David Silver, co-lead of the AlphaStar project and research scientist at DeepMind.
The top strategies were collected by five different bots, who were chosen to play against TLO. Each one performs slightly differently so it was harder for the human pros to adapt to AlphaStar’s playing style. The system was retrained, and another set of five bots were selected to play against MaNA. After the system is trained, it can run on a single desktop GPU.
"Of course, this was achieved through using enormous amounts of computation power, Julian Togelius, associate professor at the department of computer science and engineering at the New York University School of Tandon, told The Register. "Meaning that it's hard to replicate for anyone who isn't a major tech company or just want to burn inordinate amounts on games."
AlphaStar has a few advantages compared to its human opponents. It played a slightly dated version of the game “geared for research” rather than for entertainment, and could glance at the whole map. Everything not hidden by the fog of war, including its own units and its enemies can be seen at once. Humans don’t get this luxury, and instead only see bits and pieces by clicking on parts of the map.
Interestingly, when this restriction was removed from during the live match, AlphaStar was defeated by MaNa. Quick reaction times are needed to adapt to different scenarios in the game and professional players are known for their dexterity, clicking away at the mouse and keyboard and lightning speeds.
TLO and MaNa can execute many hundreds of actions per minute (APM). AlphaStar may perform about 280 APM on average, making it slower than the pros but it can execute them more precisely. It also has a reaction time of about 350 milliseconds.
It's all just still a game
So, does it mean that we, humans, should just give up StarCraft since it’s been solved? Not quite.
It looks like AlphaStar can be defeated by playing far out strategies it hasn’t previously seen before. TLO felt that if he played more matches then he would eventually start winning. There are also lots of variations of the game that it hasn’t been tested against, such as playing multiplayer team matches using different races and other maps.
DeepMind is hellbent on connecting all the dots in their research to achieving AGI one day. StarCraft may just be a game, but the Alphabet-owned firm believes it has all the components needed to test and develop machine intelligence. CEO Demis Hassabis said he believed he code could be used to help predict the weather.
3/3 While StarCraft is ‘just’ a (very complex!) game, I’m excited that the techniques behind #AlphaStar could be useful in other problems such as weather prediction & climate modeling, which also involve predictions over very long sequences. Peer-reviewed paper is underway.— Demis Hassabis (@demishassabis) January 24, 2019
We’ll believe it when we see it. ®
You can watch all the match replays here.
Sponsored: Becoming a Pragmatic Security Leader