MIT engineers are revolutionizing household robotics by incorporating large language models (LLMs) to imbue robots with common sense. Traditionally, household robots learn tasks through human imitation, leading to difficulties in handling real-world interruptions or mistakes without starting over. MIT’s approach integrates LLMs with robot motion data to provide robots with common sense knowledge, allowing them to adapt to disruptions within tasks.
The innovative technique developed by MIT enables robots to break down tasks into smaller sub-tasks, empowering them to continue tasks despite interruptions without manual intervention. Yanwei Wang, a graduate student at MIT’s EECS department, highlights the limitations of blindly mimicking human motion trajectories, which can accumulate errors and disrupt task execution. However, with their method, robots can self-correct errors and enhance task success.
“Imitation learning is a mainstream approach enabling household robots. But suppose a robot is blindly mimicking a human’s motion trajectories. In that case, tiny errors can accumulate and eventually derail the rest of the execution,” explained Yanwei Wang.
“With our method, a robot can self-correct execution errors and improve overall task success,” he added. To test their idea, the researchers demonstrated their latest technique by scooping marbles from one bowl and pouring them into another.
To validate their approach, the researchers demonstrated their technique by teaching a robot to scoop marbles from one bowl and pour them into another. Unlike traditional methods where robots imitate one continuous trajectory, MIT’s approach recognizes tasks as a series of smaller actions or paths. This necessitates breaking tasks into subtasks and tracking them, a task efficiently accomplished by deep learning models, particularly LLMs.
The team’s method involves linking a robot’s physical location or image data representing its state with natural language labels assigned to specific subtasks using an LLM, a process termed “grounding.” This enables the robot to independently execute tasks using its new “grounding” classifiers, demonstrating resilience to disturbances such as gentle nudging and the ability to self-correct.
One of the key advantages of MIT’s approach is its ability to translate training data from teleoperation systems into robust robot behavior capable of complex tasks despite external perturbations. By leveraging LLMs and grounding algorithms, the need for human programming or additional demonstrations to recover from failures is eliminated, significantly enhancing the adaptability and autonomy of household robots.
“With our method, when the robot is making mistakes, we don’t need to ask humans to program or give extra demonstrations of how to recover from failures,” Wang says. “That’s super exciting because there’s a huge effort now toward training household robots with data collected on teleoperation systems. Our algorithm can now convert that training data into robust robot behavior that can do complex tasks despite external perturbations.”
MIT’s groundbreaking research represents a significant advancement in household robotics by equipping robots with common sense capabilities through the integration of LLMs. Their innovative approach not only enables robots to handle interruptions and errors seamlessly but also reduces reliance on manual intervention, paving the way for more efficient and autonomous household assistants.