News

Google Unveils Vision-to-Action AI Models to Power Next-Gen Robots

Remix Reality Newsroom

29 Sep 2025 — 1 min read

Source: Google DeepMind

Google introduced Gemini Robotics 1.5 (VLA) and Gemini Robotics-ER 1.5 (VLM) to combine high-level reasoning with vision-guided physical action in real-world robotic tasks.
The models coordinate planning and action across different robot types, with ER 1.5 now available via API and 1.5 offered to select partners.

Google has introduced two robotics-focused models, Gemini Robotics 1.5 and Gemini Robotics-ER 1.5, designed to bring high-level reasoning and vision-guided physical action into real-world environments. Built on the Gemini architecture, the models form an agentic system that links perception, planning, and action.

Gemini Robotics-ER 1.5, a vision-language model (VLM), handles high-level decision-making such as generating step sequences and retrieving online information. It passes instructions to Gemini Robotics 1.5, a vision-language-action model (VLA), which interprets visual input to perform those steps through motor control. Working together, the models support tasks that require both semantic reasoning and precise physical execution, such as object sorting and tool use.

Both models can operate across different robotic platforms. Gemini Robotics 1.5 can transfer learned behaviors between robots without being specialized for each one, supporting varied embodiments such as the ALOHA 2 robot, Apptronik’s humanoid Apollo, and the bi-arm Franka robot.

Gemini Robotics-ER 1.5 is now accessible to developers through the Gemini API in Google AI Studio. Gemini Robotics 1.5 is currently available to select partners.

Source: YouTube / Google DeepMind

🌀 Tom’s Take:

Google’s latest release focuses on foundational models that aim to let robots of all types act without much human intervention or hard-coding.

Source: Google DeepMind

Cirque du Soleil Finds New Fans in the Metaverse

Cirque du Soleil expands its reach in The Sandbox with immersive experiences, digital collectibles, and social gameplay that connect younger audiences to its artistry while boosting engagement and extending its brand into daily digital life.

Meta Introduces SAM 3D to Reconstruct Objects and Humans from a Single Image

Meta has launched SAM 3D, a new system for turning everyday images into 3D models. The release includes SAM 3D Objects for reconstructing physical scenes and objects, and SAM 3D Body for estimating human pose and shape.

Figure’s Humanoid Robot Supports Production of 30,000 BMW Vehicles

Figure has published a report detailing the results of an 11-month deployment of its second-generation humanoid robot, Figure 02, at BMW Group Plant Spartanburg.

📬 Remix Reality Weekly: When AI Meets Reality

In this drop, physical AI takes the spotlight as perception systems gain new funding. Plus UBTECH begins Walker S2 production, Distalmotion raises $150M, Agile ONE hits factory floors, A2RL preps its six-car autonomous race, and 8th Wall winds down.

🌀 Tom’s Take:

Read more

Cirque du Soleil Finds New Fans in the Metaverse

Meta Introduces SAM 3D to Reconstruct Objects and Humans from a Single Image

Figure’s Humanoid Robot Supports Production of 30,000 BMW Vehicles

📬 Remix Reality Weekly: When AI Meets Reality