Google DeepMind has introduced two innovative AI models, Gemini Robotics and Gemini Robotics-ER, designed to redefine how robots perceive, reason, and interact with their surroundings. These models enhance robotic dexterity, adaptability, and safety, with potential applications ranging from humanoid assistants to industrial automation.
For AI-powered robots to be truly effective, they must exhibit three key traits: generality, interactivity, and dexterity. DeepMind asserts that Gemini Robotics significantly improves all three, leveraging Gemini’s vast knowledge to adapt to new environments, objects, and tasks effortlessly. Compared to previous vision-language-action (VLA) models, Gemini Robotics has more than doubled performance on generalization tests.
The model’s interactivity stems from Gemini 2.0’s advanced language capabilities, allowing it to understand natural, conversational commands in multiple languages, recognize environmental changes, and adjust its behavior accordingly. This flexibility enhances human-robot collaboration across various domains.
Another major advancement is its dexterity. Google claims Gemini Robotics can execute complex, multi-step tasks requiring fine motor skills, such as folding origami or sealing a Ziploc bag. Its adaptability allows it to function on multiple robotic platforms, including humanoid robots like Apptronik’s Apollo and bi-arm systems like ALOHA 2, expanding its practical applications.
In addition to Gemini Robotics, DeepMind has introduced Gemini Robotics-ER, an advanced AI model specializing in embodied reasoning. This model enhances robots’ spatial awareness and can be integrated with existing robot control systems to improve motion planning and object manipulation.
Google reports that Gemini Robotics-ER strengthens key functions such as pointing, 3D object detection, and spatial reasoning, making it a significant upgrade over Gemini 2.0.
A major advantage of this model is its ability to combine spatial reasoning with coding, allowing robots to instantly develop new capabilities. For example, when presented with a coffee mug, the model determines the optimal grip for the handle and plans a safe approach for grasping it, making robot interactions more natural and efficient.
Designed for full-spectrum robot control, Gemini Robotics-ER handles perception, planning, state estimation, and real-time code generation. End-to-end testing has demonstrated success rates two to three times higher than Gemini 2.0. Furthermore, when preprogrammed code is insufficient, the model applies in-context learning, refining its approach based on a limited number of human demonstrations.
As AI-driven robots become increasingly autonomous, safety is a major concern. DeepMind is tackling this issue through a multi-layered safety framework, addressing both low-level motor control and high-level decision-making.
Gemini Robotics-ER incorporates collision avoidance and force limitation mechanisms while also evaluating whether an action is safe in a given context. To further AI safety research, DeepMind is releasing a dataset designed to assess semantic safety in embodied AI. Inspired by Asimov’s Three Laws of Robotics, the dataset enables researchers to create and modify natural language rules that guide robotic behavior.
DeepMind is also working with the Responsibility and Safety Council and external experts to evaluate the societal impact of AI robotics. Companies such as Apptronik, Boston Dynamics, and Agility Robotics are already testing Gemini Robotics-ER to ensure its real-world reliability and adaptability.