IT researchers have developed a technique using diffusion models to combine multiple data sources, allowing robots to learn various tasks and adapt to new ones. This method, known as Policy Composition (PoCo), improved robot work performance by 20% compared to traditional procedures.
Training robots to use tools like screwdrivers, hammers, and wrenches requires vast amounts of data demonstrating these actions. Current robotic datasets vary widely, containing different modalities such as tactile impressions and color images. They are sourced from human demonstrations or simulations, and each dataset typically represents different tasks and settings.
Traditional methods often train robots using a single data type, limiting their ability to perform new tasks in unfamiliar environments. The MIT method addresses this by learning a policy for each task with a different diffusion model, then merging these policies to create a comprehensive strategy that allows robots to execute numerous tasks in diverse environments.
A robotic policy is a machine-learning model that directs a robot’s actions based on inputs, similar to guiding a robotic arm’s movements. Typically, datasets for these policies are small and task-specific. MIT researchers enhanced this by building on Diffusion Policy, a technique that combines smaller datasets from various sources, enabling robots to generalize tasks. Diffusion models train separate policies on different datasets, combined by iteratively refining outputs to meet each policy’s goals.
“Addressing heterogeneity in robotic datasets is like a chicken-egg problem. If we want to use a lot of data to train general robot policies, we first need deployable robots to get all this data. I think that leveraging all the heterogeneous data available, similar to what researchers have done with ChatGPT, is an important step for the robotics field,” said Lirui Wang, an electrical engineering and computer science (EECS) graduate student and lead author of the study.
One advantage of this approach is the ability to combine policies for optimal results. For example, a policy trained on real-world data can enhance dexterity, while a simulation-based policy can improve generalization. Instead of starting from scratch, users can add data in new modalities or domains by training an additional Diffusion Policy with that dataset.
Researchers tested PoCo in simulations and on actual robotic arms, performing various tool operations, such as using a spatula to flip an object and a hammer to pound nails. Compared to baseline approaches, task performance improved by 20%.
“The striking thing was that when we finished tuning and visualized it, we can clearly see that the composed trajectory looks much better than either one of them individually,” said Wang.