AI + ML

This article is more than 1 year old

Wanna build an AI robot? Don't have an actual robot yet? Try this Holodeck for droids

OpenAI emits more simulation environments for toolkit Gym

Mon 26 Feb 2018 // 19:35 UTC

OpenAI today updated Gym – its system for training intelligent software – so that developers can teach physical robots to hold pens, pick up and move objects, and so on.

Gym was launched in 2016, and is a toolkit for teaching programs new tricks, such as playing Atari games and balancing poles, via reinforcement learning (RL). Now, OpenAI has added a bunch of simulated environments designed to train physical robots how to move and interact with things around them albeit in a virtual world.

For example, the simulated environments can be used to teach robotic fingers to play an instrument, or pick and lift an object from the table. This is useful for folks interested in rapidly training intelligent robots over thousands of exercises, without having to rig up a relatively slow-moving physical bot, or before they have a chance to get hold of the hardware.

This Star Trek-style Holodeck approach is much faster and easier than training a robot in a physical environment – the resulting model can, of course, be later used to control a real-world machine, when it's ready.

Peter Welinder, a researcher at OpenAI, told The Register that “just as a real gym has different ‘environments’ – like a treadmill, a bench press, an exercise bike, and so on – the OpenAI Gym has environments for AI agents such as ‘make a toy figure walk’ or ‘make a car run up a slope.’"

Specifically, the latest environments simulate a Fetch robotic arm to push stuff around, and a ShadowHand to grip and manipulate things with robotic fingers.

All the new robotics environments are trained using sparse rewards. Typically, RL models are rewarded little by little as they get closer to their goal. The reward encourages the software, and indicates it is gradually learning to do the right thing. Sparse rewards, on the other hand, are only given when the code completes its goal.

Why, Robot? Understanding AI ethics

It's the difference between telling a computer to make a sandwich, and giving it rewards points for getting two slices of bread, then more points for grabbing some ham, then more points for layering them – and just giving points for the sandwich when it's done.

“Let's take the arm pushing the puck as an example," said Welinder. "It tries to do some motion randomly, like just hitting the puck from the side. In the traditional RL setting, an oracle would give the agent a reward based on how close to the goal the puck ends up. The closer puck to the goal, the bigger the reward. So, in a way, the oracle tells the agent ‘you're getting warmer.’

“Sparse rewards essentially pushes this paradigm to the limit: the oracle only gives a reward if the goal is reached. The oracle doesn't say 'you're getting warmer' anymore. It only says: "You succeeded," or "You failed." This is a much harder setting to learn in, since you're not getting any intermediate clues.”

Sparse reward learning is supposed to mirror the conditions for training robots in the real world. “For example, if I want my robot to pour wine into a glass I just tell it 'this is how much wine there should be in the glass,'" said Welinder. "I don't want to have to tell it 'first grab the bottle, then lift it up, tip it over the glass edge, pour until it reaches this level, hold for two seconds, stop.'"

To train robots via sparse reward, OpenAI has also released code called Hindsight Experience Replay (HER), an RL algorithm that learns by replaying and assessing its performance after attempting to complete a task.

Since the environments are open source, other developers can customize them to introduce new robot motions or different objects. OpenAI has also published a list of research ideas for developers interested in improving the HER algorithm, on page six of this technical report. ®

Topics

Special Features

Vendor Voice

Resources

AI + ML

Wanna build an AI robot? Don't have an actual robot yet? Try this Holodeck for droids

OpenAI emits more simulation environments for toolkit Gym

Why, Robot? Understanding AI ethics

More about

More about

Narrower topics

More about

More about

More about

Narrower topics

TIP US OFF

Other stories you might like

OpenAI launches Asian operations in Tokyo to avoid being lost in translation

How to coax ChatGPT into making better predictions: Get it to tell tales from the future

Boston Dynamics' humanoid Atlas is dead, long live the ... new commercial Atlas

Protecting distributed branch office environments from ransomware

AI gold rush continues as Microsoft invests $1.5B in UAE's G42

OpenAI CEO wants UAE into his plan for a global AI cabal

MPs ask: Why is it so freakin' hard to get AI giants to pay copyright holders?

Industrial robots make people feel worse about jobs and themselves

OpenAI claims its software can clone your voice from 15 seconds of you talking

Microsoft, OpenAI may be dreaming of $100B 5GW AI 'Stargate' supercomputer

US House of Reps tells staff: No Microsoft Copilot for you!

Grok-1 chatbot model released – open source or open Pandora's box?

About Us

Our Websites

Your Privacy