Toyota RI - Large Behavior Models for Robot Generalization
Breakthrough? Using Large Behavior Models to help robots learn new dexterous behaviors from demonstration
“We now have robots that can converse with people, but can’t open a bag of chips”
Introduction
In my recent paper review on trajectory learning using RoboAgent, I speculated how reasoning engines can be used to help robots generalize learned skills faster, cheaper, and more robustly.
Now it seems that Toyota Research Institute (TRI) has used a similar idea to achieve a huge leap forward and developed a new methodology to train robots new skills using a diffusion model.
And that is massively exciting.
Executive Summary
Most industrial robots are programmed to do one thing and do this one thing well. They can’t easily learn new skills unless an engineer writes the instructions for it and/or trains the robot using expensive training hardware.
And even then, the same robot could not be used easily in a completely different physical setting (scene) doing the same task or in the same setting a similar task.
The market for industrial robots is massive (USD 142.8 billion by 2032, growing at a CAGR of 11.4% from 2023 to 2032) and a solution like this could establish a whole new group of category leaders.
Now, TRI has developed a diffusion-based method to teach robots new skills that is faster, more versatile, and more reliable than before.
The research team could show that the robots learned 60 difficult, dexterous skills like pouring liquids, using tools, and manipulating deformable objects (!) without writing a single line of code (!!). Only by providing new data to the robot.
Who is Toyota Research Institute?
The Toyota Research Institute (TRI) is a research and development organization established by Toyota Motor Corporation, one of the world's largest automobile manufacturers. TRI was established in 2015 with the aim of advancing the field of artificial intelligence (AI), robotics, and autonomous vehicle technologies to improve the safety, accessibility, and quality of life for people. (Their words, not mine).
TRI focuses on various research areas, including:
Automated Driving: Developing self-driving technologies to make vehicles safer and more efficient.
Robotics: Advancing the field of robotics to improve the quality of life, especially for the elderly and people with disabilities.
Materials Science: Researching new materials.
Problem
As I had mentioned before, generalization is a core problem in robotics specifically in machine learning in general. No two situations are alike as no two days are the same. If we truly want to have autonomous driving or robot assistants, then we need to find a way for our robots to become far more versatile than our friendly Roomba.
Secondly, training robots new skills is incredibly expensive. Robot engineers are rare and highly paid, using their expert knowledge to code new skills into expensive hardware that has zero tolerance for failure.
Even if we apply demonstrated learning, a human teleoperator still requires hours to perform hundreds of motivations to create the data needed for the robot to understand the poses and trajectories the robot shall perform.
And even then, the evaluation if the robot has learned the task well requires an evaluation of the actual physical hardware. Therefore, any hyperparameter optimization is really impractical.
So what can be done?