News

mimic-video Uses Pretrained Video Models to Improve Robot Learning Efficiency by 10x

Remix Reality Newsroom

22 Dec 2025 — 1 min read

Source: mimic video

mimic-video is a video-action model that reduces data needs for robot learning.
Tests showed mimic-video learns tasks with 10x less data and converges twice as fast as typical VLA models.

mimic-video is a new robot control system from teams at mimic robotics, Microsoft Zurich, ETH Zurich, ETH AI Center, and UC Berkeley. The team says the video-action model (VAM) helps robots learn faster and with less training data, particularly by reducing the need for large-scale teleoperation compared to traditional vision-to-language-action (VLA) models.

Source: YouTube / mimic

mimic-video uses a pretrained video model, NVIDIA Cosmos-Predict2, trained to understand how scenes change over time. That output becomes a visual plan. This is then used with a smaller action decoder that turns it into robot movements. The video and action parts are trained on separate schedules, so each can be optimized independently. This approach differs from traditional VLA models, which rely on vision-language models (VLMs) trained on static image and text data, and typically require large amounts of teleoperation data for finetuning.

The model was evaluated in both simulation and on real robots, including dual-arm systems and dexterous humanoid hands. In these tests, it learned with 10x less data and converged about twice as fast as standard VLA baselines. Ground-truth video inputs led to higher success rates, indicating that control performance depends on video model quality.

🌀 Tom’s Take:

mimic-video uses video models to simplify robot learning, showing that visual prediction can reduce the need for large teleoperation datasets.

Sources: arXiv / Github

📬 Remix Reality Insider: Turning Movement Into Control

In this drop, we explore how motion becomes control, why glTF brings Gaussian splats into the 3D mainstream, and what it feels like to stand next to real humanoid robots. PLUS: Waymo’s $16B raise, Apeiron’s ocean drones, Bedrock’s fleets, Faraday’s robots.

Apeiron Labs Raises $9.5M to Expand Real-Time Ocean Intelligence Network

Apeiron Labs has raised $9.5 million in Series A funding to expand its network of autonomous underwater vehicles (AUVs) for real-time ocean intelligence. The round was co-led by S2G Investments, RA Capital’s Planetary Health Fund, and DYNE Ventures, with participation from Assembly Ventures

Khronos Publishes Release Candidate for glTF Gaussian Splatting Extension

Khronos Group released a draft update to glTF that adds support for 3D Gaussian splat data. The group said the rapid growth of Gaussian splatting could lead to fragmentation without standardization. The draft is open for industry feedback ahead of final approval.

Humanoid Introduces KinetIQ as Unified AI Framework for Humanoid Robot Fleets

Humanoid introduced KinetIQ, its proprietary AI framework for end-to-end orchestration of humanoid robot fleets. Humanoid says KinetIQ is a single system that controls robots with different physical designs and coordinates how they work together.

🌀 Tom’s Take:

Read more

📬 Remix Reality Insider: Turning Movement Into Control

Apeiron Labs Raises $9.5M to Expand Real-Time Ocean Intelligence Network

Khronos Publishes Release Candidate for glTF Gaussian Splatting Extension

Humanoid Introduces KinetIQ as Unified AI Framework for Humanoid Robot Fleets