News

ByteDance Releases Depth Anything 3, a Simpler Way to Reconstruct 3D Scenes from Images or Video

Remix Reality Newsroom

01 Dec 2025 — 1 min read

Source: ByteDance Seed

ByteDance Seed Team released Depth Anything 3, a vision model that reconstructs 3D scenes from single images, multi-view inputs, or video.
According to the research team, the model sets new benchmarks in pose estimation and depth accuracy.

Depth Anything 3 is a new vision model from the ByteDance Seed Team, founded in 2023 to build advanced AI foundation models. The model reconstructs 3D scenes from visual input, including single images, a set of images, or video. The team behind the project recently released its open-source tools via the project site, GitHub, and Hugging Face, along with an interactive demo and technical report.

The model is built on a standard image processing system trained to understand the shape and depth of a scene. It learns by predicting how far things are based on how they appear, using a method called depth-ray prediction. While the design is simpler than previous approaches, it still produces accurate 3D results, supported by a training process where one version of the model teaches another.

0:00

/0:18

Source: ByteDance Seed Team / Github

The research team reports that Depth Anything 3 outperforms both its predecessor, Depth Anything 2, and VGGT, the previous state-of-the-art model, on their internal benchmark for pose estimation and geometric accuracy. On its project page, the team highlights applications of this model in 3D design, robotics, and immersive media, and has made tools available for testing and integration, with support for exporting results in formats like .glb, .ply, and 3D Gaussian splats.

🌀 Tom’s Take:

By releasing open-source tools, the team is turning the model into more than just a paper, letting developers put it into practice for robotics, 3D design, or VR projects with minimal setup.

Source: Depth Anything 3 Project Page

How Computer Vision Is Teaching Machines to Read the Wild

Camera traps capture rare wildlife moments, but millions of images need sorting. Computer vision tools like MegaDetector and SpeciesNet turn chaos into data, helping scientists track species, find rare animals, and speed conservation work worldwide.

Trener Robotics Raises $32M to Scale Conversational Robot Control Platform

Trener Robotics announced a $32 million Series A financing round, bringing its total funding to over $38 million. The round was co-led by Engine Ventures and IAG Capital Partners, with participation from Cadence and Geodesic Capital through Nikon’s NFocus Fund.

📬 Remix Reality Insider: Turning Movement Into Control

In this drop, we explore how motion becomes control, why glTF brings Gaussian splats into the 3D mainstream, and what it feels like to stand next to real humanoid robots. PLUS: Waymo’s $16B raise, Apeiron’s ocean drones, Bedrock’s fleets, Faraday’s robots.

Apeiron Labs Raises $9.5M to Expand Real-Time Ocean Intelligence Network

Apeiron Labs has raised $9.5 million in Series A funding to expand its network of autonomous underwater vehicles (AUVs) for real-time ocean intelligence. The round was co-led by S2G Investments, RA Capital’s Planetary Health Fund, and DYNE Ventures, with participation from Assembly Ventures

🌀 Tom’s Take:

Read more

How Computer Vision Is Teaching Machines to Read the Wild

Trener Robotics Raises $32M to Scale Conversational Robot Control Platform

📬 Remix Reality Insider: Turning Movement Into Control

Apeiron Labs Raises $9.5M to Expand Real-Time Ocean Intelligence Network