AI bots suck at marking written essays, not too shabby at old Atari games, and more...
The week in AI
Roundup Hello, here's a quick roundup of some announcements from the world of AI this week.
OpenAI researchers reach the highest score yet on the computer game Montezeuma's Revenge through reinforcement learning, DeepMind teaches its bots to play Capture the Flag on Quake III Arena and the US Department of Education are exploring the idea of using AI to mark essays. It's all here in the weekly roundup.AI
Montezuma’s Revenge is back: OpenAI researchers have trained a bot from a single video demonstration to reach the highest score on Montezuma’s Revenge yet.
The classic Atari game is challenging for reinforcement learning algorithms due to the sparse rewards available. Agents have to explore and figure out the best combination moves to execute to collect scores over a longer time compared to other Atari games that are faster paced like Breakout.
So it’s very difficult to teach bots to play the game completely through trial and error and researchers have to train it on human demonstrations instead. OpenAI used its Proximal Policy Optimization algorithm and coaxed its agent to play from similar states already seen in the training video.
“Our approach works by letting each RL episode start from a state in a previously recorded demonstration. Early on in training, the agent begins every episode near the end of the demonstration. Once the agent is able to beat or at least tie the score of the demonstrator on the remaining part of the game in at least 20 per cent of the rollouts, we slowly move the starting point back in time,” it explained in a blog post.
“We keep doing this until the agent is playing from the start of the game, without using the demo at all, at which point we have an RL-trained agent beating or tying the human expert on the entire game.”
The agent reached a score of 74,500. DeepMind also had a crack at Montezuma’s Revenge recently using YouTube videos for training and reached a 41,098.
Now, here’s Capture the Flag: Speaking of DeepMind, researchers across the pond have taught a team of bots to play the old Quake III Arena game in Capture the Flag mode.
In the game, players play in two teams. The goal is to take the other team’s flag whilst also protecting your own flag too. Players can chase after opponents in order to tag them and send them back to their spawning point. The team who has captured the most flags within five minutes wins.
DeepMind hosted informal matches with 40 human players, who were split into teams containing bots as both teammates and enemies. The researchers found that teams with bots helped “exceed the win-rate of the human players,” and were seen as being more collaborative than human players.
Instead of focusing on a single bot, the researchers trained a population of agents to play with each other. Each agent learns its own reward signal and can generate its own goal, whether that be capturing a flag or protecting its own.
Dubbed the For The Win agent it reaches high performance levels, beating human players and other reinforcement learning methods after playing more than 150,000 training games. Agents have the advantage of lightning speed reactions so they were faster at tagging opponents, but they also learnt certain strategies like following teammates around the map or camping near the opponent’s territory.
You can read more about it here.
Intel and Baidu working together: Intel announced a range of collaborations with Baidu during the Baidu Create conference in Beijing this week.
- Xeye - a camera aimed at retailers who want to analyse objects and detect people using facial recognition. It uses Intel Movidius’ vision processing units chips.
- Baidu Cloud and FPGAs - Baidu Cloud users can now access Intel’s FPGAs to handle AI workloads.
- Paddle Paddle and Xeons - Baidu’s AI framework Paddle Paddle now supports Intel’s Xeon Scalable processors.
“From enabling in-device intelligence, to providing data center scale on Intel Xeon Scalable processors, to accelerating workloads with Intel FPGAs, to making it simpler for PaddlePaddle developers to code across platforms, Baidu is taking advantage of Intel’s products and expertise to bring its latest AI advancements to life,” said Gadi Singer, vice president and architecture general manager at Intel’s Artificial Intelligence Products Group.
Robo-graders get an F: The thought of AI marking exam essays should ring alarm bells, but apparently the US Department of Education are thinking about doing exactly that.
Machines have been offered as a solution to sniff out fake news or moderate hateful internet comments, but it never works because they are bloody awful at actually understanding content. Just look at the so-called “smart” digital assistants like Siri, Google Home, or Amazon’s Alexa.
For some reason this doesn’t phase people at the Department of Education, according to a recent NPR clip.
“Department Of Education Deputy Commissioner Jeff Wulfson cited "huge advances in artificial intelligence in the last few years" and cracked, "I asked Alexa whether she thought we'd ever be able to use computers to reliably score tests, and she said absolutely." Oh dear.
Luckily, teachers have been pushing back arguing that machines marking will be rigid, ignoring the creativity and expression.
Here’s a short paragraph that gets top marks from an algorithm.
"History by mimic has not, and presumably never will be precipitously but blithely ensconced. Society will always encompass imaginativeness; many of scrutinizations but a few for an amanuensis. The perjured imaginativeness lies in the area of theory of knowledge but also the field of literature. Instead of enthralling the analysis, grounds constitutes both a disparaging quip and a diligent explanation."
Mind you, that text has been generated by an algorithm too. Known as Babel (Basic Automatic B.S. Essay Language), it creates sentences peppered with very impressive sounding words, and even includes a comma or two now and again and full stops at the end of sentences. But it’s complete gobbledygook and doesn’t mean a thing. By those standards El Reg articles would probably score a big fat 0.
Things might not be so bad if a robo-grader is paired with a fact checker and a human reader. But there are still issues around what signals to look out for in order to flag a human reader. ®