Researchers at the University of California, Berkeley, have built a unique robot: one that taught itself how to walk.
We’ve all learned to walk at some point or another. Everyone takes different lengths of time to figure it out: our babies take a few months or years to do it, baby gazelles can do it almost as soon as they are born. And, if we’re to judge from new research, baby robots need around one hour to get the hang of it.
The research is remarkable as this robot, a four-legged device reminiscent of a mechanical puppy, learned to walk by itself, without being shown any simulations to instruct it beforehand.
First steps
“Teaching robots through trial and error is a difficult problem, made even harder by the long training times such teaching requires,” says Lerrel Pinto, paper co-author and an assistant professor of computer science at New York University, who specializes in robotics and machine learning.
This feat was made possible by an AI the team designed and christened Dreamer. Dreamer relies on a technique called reinforcement learning which ‘trains’ algorithms by continuous feedback, rewarding desired actions such as the successful completion of a task. In a sense, this process is similar to how we ourselves learn, in our case, through the doling out of pleasurable chemicals such as dopamine.
The common approach in training robots is to use computer simulations to let them grasp the basics of whatever they are doing before making them attempt the same tasks in the real world.
“The problem is your simulator will never be as accurate as the real world. There’ll always be aspects of the world you’re missing,” says Danijar Hafner, a PhD student in artificial intelligence at the University of Toronto and paper co-author.
What’s special about Dreamer is that it uses past experiences to build models of the surrounding world, and conduct trial-and-error calculations in a simulation based on this model. In other words, it can practice its task inside a dream-like mirror of our world (hence the name) by predicting the potential outcomes of the actions it plans to undertake. Armed with this knowledge, it can then try out what it learned in the lab. It does all of this by itself. Essentially, it is teaching itself.
This approach allows the AI to learn much faster than by performing the action alone. At first, all it could manage was to wave its legs helplessly in the air. It took around 10 minutes for it to flip over onto its underside, and about 30 minutes to take its first steps. One hour after the experiment began, however, it could easily make its way around the laboratory on steady feet.
In addition to teaching itself how to walk, Dreamer could then adapt to unexpected situations, such as resisting being toppled by one of the team members.
The results show the incredible achievements deep reinforcement learning can achieve when paired with word models, especially considering that the robot received no prior instruction. The use of these two systems in tandem dramatically cut the traditionally-long training times required in trial-and-error reinforcement learning for robots.
Furthermore, removing the need to train robots inside a simulation and allowing them to practice inside their world models instead can allow them to learn skills in real-time — giving them the tools to adapt to unexpected situations such as hardware failures. It can also have applications in complex, difficult tasks like autonomous driving.
Using this approach, the team successfully trained three other robots to perform different tasks, such as picking up balls and moving them between trays.
One downside of this approach is that it is extremely time-consuming to set up. Researchers need to specify in their code which behaviors are good — and thus should be rewarded — and which are not. Each and every task or problem that a robot is meant to solve will need to be broken down into its sub-tasks and each sub-task defined in terms of good or bad. This also makes it very hard to program such an algorithm for unexpected situations
Furthermore, inaccuracies in the world models these robots use are very damaging to their performance, and constructing reliable world models takes a lot of time and data.
Still, considering the incredible goal that the team is working on, teaching machines how to adapt to new situations on the fly and use past experience to find solutions on their own, encountering such hurdles was to be expected. While these are being ironed out, the team plans to make their robot understand spoken commands, and equip it with cameras and vision so it can better navigate its surroundings — or even play fetch.
The paper “DayDreamer: World Models for Physical Robot Learning” has been published in the journal arXiv.
Was this helpful?