Generative AI has already shown great promise in various robotic applications, such as natural language interactions, robot learning, no-code programming, and design. The Google DeepMind Robotics team has recently demonstrated another potential intersection of these disciplines: navigation.
In a paper titled “Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs,” the team reveals how they have utilized Google Gemini 1.5 Pro to teach a robot to follow commands and navigate an office space. Leveraging some of the Every Day Robots, which have been around since Google ended that project due to layoffs, DeepMind showcased their innovative work in a series of videos.
In the demonstration videos, DeepMind employees initiate interactions with the robots using the command, “OK, Robot,” followed by specific tasks. For instance, one employee asks the robot to take them to a place for drawing. The robot, adorned with a jaunty yellow bowtie, responds, “OK, give me a minute. Thinking with Gemini …” and then leads the person to a wall-sized whiteboard. In another example, a different individual instructs the robot to follow directions written on the whiteboard. The robot processes the command and navigates a long route to the “Blue Area,” which is a robotics testing area, confidently announcing its successful completion of the task.
Before these demonstrations, the robots were familiarized with the office space using a method called “Multimodal Instruction Navigation with demonstration Tours (MINT).” This involves guiding the robot around the office and identifying various landmarks through speech. Subsequently, the team employs hierarchical Vision-Language-Action (VLA), which combines environment understanding with common sense reasoning. This allows the robot to respond to written and drawn commands, as well as gestures.
This innovative approach highlights how generative AI can enhance robotic navigation, making it more intuitive and adaptable to complex environments like office spaces. The successful integration of Gemini into these robots signifies a significant step forward in the field of robotics, promising even more sophisticated interactions and functionalities in the future.