Is that you, T-1000? No, just a lil robot that can mimic humans on sight

Don't worry, it's not as terrifying as it sounds

This has nothing to do with the real experiment – we just thought it was a cool picture of a robot woman watching TV

Video Boffins have taught a robot how to imitate the way someone handles objects after watching them just once.

While humans and animals are intelligent enough to mimic simple movements they've only just seen, this is beyond today's relatively dumb software. We're nowhere near T-1000 Terminator series levels, yet.

Researchers from the University of California, Berkeley, in the USA, have made some progress on this front by teaching code controlling a robot arm and hand to perform three tasks: grabbing an object and placing it in a specific position; pushing an object; and pushing and pulling an object after seeing the same action performed by a human arm.

Think picking up stuff, such as a toy, and placing it on a box, pushing a little car along a table, and so on.

The technique, described in a paper out this week, has been dubbed “one-shot imitation.” And, yes, it requires a lot of training before it can start copycatting people on demand. The idea is to educate the code to the point where it can immediately recognize movements, or similar movements, from its training, and replay them.

A few thousand videos depicting a human arm and a robot arm completing movements and actions are used to prime the control software. The same actions are repeated using different backgrounds, lighting effects, objects, and human arms to increase the depth of the machine-learning model's awareness of how the limbs generally operate, and thus increase the chances of the robot successfully imitating a person on the fly.


Chelsea Finn, a PhD student, and Tianhe (Kevin) Yu, an undergraduate student, both at the UC Berkeley Artificial Intelligence Research group, explained to The Register on Wednesday: “The human demos allow [the robot] to learn how to learn from humans. Using the human demos – just a video of human performing the task – the robot adapts to the task shown in the demonstration."

The training videos are converted into sequences of still images and fed into a convolutional neural network that maps the pictured actions to the possible movements that can be performed by the robot arm and its claw, so that it builds up an understanding of how to position itself to imitate movements caught on camera. It also learns the features of objects, such as colors and shapes, so that it knows how to identify and grasp them.

Crucially, the robot should be able to cope with new objects it hasn't seen during training; simply watching a person handle an arbitrary thing should be enough for it to twig how it should move its joints to pick up and move the item in an identical fashion.

It learns via a process called meta-learning. This is not the same as supervised learning, which is typically used in deep-learning research and involves training systems to perfect a narrow, single task and testing the software by giving it an example that it hasn’t seen before.

“Meta-learning is learning to learn a wide range of tasks quickly and efficiently. By applying meta-learning to robotics, we hope to enable robots to be generalists like humans, rather than mastering only one skill,“ Finn and Yu said. “Meta-learning is particularly important for robotics, since we want robots to operate in a diverse range of environments in the real world.”

“In essence, the robot learns how to learn from humans using this data. After the meta-training phase, the robot can acquire new skills by combining its learned prior knowledge with one video of a human performing the new skill,” they, and their fellow academics, added in their paper.

Meta-training to meta-testing

After the robot has been trained, it can use inference to imitate a human after watching a clip it hasn’t seen before. You can see the robot in action here:

Youtube Video

At first, the movements between the human and the robot may look slightly different. That's because the robot may not have picked up on subtle or minute hand and finger gestures, or be thrown off by the lighting and background. However, the overall task is completed in pretty much the same way.

The robot arm can’t learn a motion completely from scratch on demand: it needs to have seen something similar during training. It manages to push, place, and pick up the right objects over 70 per cent of the time, though, during tests. There are a few failure cases, where it fails to choose the right object or motion.

It’s also more likely to fail to copy humans when the video depicts a new background, which shows that the robot's brain is somewhat preoccupied by patterns in its environment that are not particularly important to the task at hand.

Deep learning is data hungry, and the researchers reckon collecting more of it using a diverse range of backgrounds during training will reduce the failure rate. There were also a number of motion faults that occurred for all backgrounds, so the learning algorithms controlling the robot also have to improved.

The team believes experiments like these will help refine robots that have to select the correct product or other object from a collection of things. At the moment, the team uses three different models of robot arms for each task. They hope to integrate these all into one model, so that a single robot can perform all the different chores as well as increasing the complexity of the tasks. ®

Sponsored: Balancing consumerization and corporate control

Biting the hand that feeds IT © 1998–2019