Gemini Robotics Models Unlock General-Purpose Dexterity in Real-World Robots

Gemini Robotics Models Unlock General-Purpose Dexterity in Real-World Robots
Source: Google
  • Google DeepMind’s new Gemini Robotics models enable robots to perform unseen, multi-step tasks with high dexterity.
  • The models integrate multimodal inputs with physical action, supporting embodied reasoning and vision-language-action control.

Google DeepMind has introduced a new family of Gemini Robotics models that enable robots to understand and perform complex physical tasks using natural language. These models are fine-tuned versions of Gemini 2.0, adapted with robot-specific data to allow real-time interactions with unfamiliar objects and environments. One test showed a robot executing a “slam dunk” with a toy hoop and ball it had never seen, based solely on the verbal prompt.

“Our mission is to build embodied AI to power robots that help you with everyday tasks in the real world,” said Head of Robotics, Carolina Parada on the Google Blog. “Eventually, robots will be just another surface on which we interact with AI, like our phones or computers — agents in the physical world.”

The lineup includes Gemini Robotics and Gemini Robotics-ER. The former is a vision-language-action model that advances robotic dexterity by solving multi-step tasks with smooth execution. The latter, built on Gemini 2.0 Flash, focuses on embodied reasoning—recognizing objects, predicting trajectories, and generating executable actions. The team trained on a wide range of tasks rather than repetitive single-task learning to foster generalization.

Robots powered by these models have packed lunchboxes, folded an origami fox, and cleaned whiteboards. They’ve been tested on various embodiments—from academic research arms to commercial humanoid systems—demonstrating versatility across tasks and platforms.

Source: YouTube/Google


🌀 Tom's Take:

The rise of humanoid robots highlights how spatial intelligence, powered by sensors and perception systems, is essential to real-world AI. Google DeepMind showcases this with Gemini Robotics, demonstrating how multimodal models give embodied agents a richer, more actionable understanding of the world.


Source: Google Blog