Depth Anything V2 Large β€” SafeTensors

Depth Anything V2 (Large, ViT-L backbone) converted to SafeTensors format for safe, fast loading in robotic depth estimation pipelines. 335M parameters for high-quality monocular depth maps.

This model is part of the RobotFlowLabs model library, built for the ANIMA agentic robotics platform β€” a modular ROS2-native AI system that brings foundation model intelligence to real robots operating in the real world.

Why This Model Exists

Monocular depth estimation is fundamental to robotic navigation and manipulation β€” robots need to know how far away things are from a single camera. Depth Anything V2 produces the highest-quality relative depth maps from a single image. The original weights are distributed as raw .pth files. We converted them to SafeTensors format for safe, zero-copy memory-mapped loading.

Model Details

Property Value
Architecture DPT head + ViT-Large encoder
Parameters 335M
Encoder ViT-L/14 (DINOv2-based)
Input Resolution Flexible (recommended 518Γ—518)
Output Dense relative depth map
Training Synthetic + real depth labels (multi-stage)
Original Model depth-anything/Depth-Anything-V2-Large
License Apache-2.0

Included Files

depth-anything-v2-large/
β”œβ”€β”€ model.safetensors          # 1.3 GB β€” Full model weights
└── README.md                  # This file

Quick Start

from safetensors.torch import load_file
import torch

# Load SafeTensors weights
state_dict = load_file("model.safetensors")

# Load into Depth Anything V2 architecture
from depth_anything_v2.dpt import DepthAnythingV2

model = DepthAnythingV2(encoder='vitl', features=256, out_channels=[256, 512, 1024, 1024])
model.load_state_dict(state_dict)
model.to("cuda").eval()

# Predict depth
depth = model.infer_image(image)  # Returns relative depth map

With Transformers

from transformers import AutoModelForDepthEstimation, AutoImageProcessor
import torch

processor = AutoImageProcessor.from_pretrained("depth-anything/Depth-Anything-V2-Large")
model = AutoModelForDepthEstimation.from_pretrained("depth-anything/Depth-Anything-V2-Large")
model.to("cuda").eval()

inputs = processor(images=image, return_tensors="pt").to("cuda")
with torch.no_grad():
    depth = model(**inputs).predicted_depth

With FORGE (ANIMA Integration)

from forge.vision import VisionEncoderRegistry

depth_estimator = VisionEncoderRegistry.load("depth-anything-v2-large")
depth_map = depth_estimator(image_tensor)  # Relative depth map

Use Cases in ANIMA

Depth estimation is critical across ANIMA modules:

  • Obstacle Avoidance β€” Real-time depth maps for safe navigation
  • Grasp Planning β€” Estimate object distance for manipulation reach calculations
  • 3D Reconstruction β€” Dense depth for point cloud generation from single camera
  • Safety Zones β€” Distance-based safety boundaries for human-robot collaboration
  • Path Planning β€” Identify traversable spaces and obstacle heights

Depth Anything V2 Family

Model Params Size Best For
depth-anything-v2-large 335M 1.3 GB Highest quality depth
depth-anything-v2-small 24.8M 95 MB Real-time edge deployment

Intended Use

Designed For

  • Monocular depth estimation for robotic navigation
  • Dense depth maps for manipulation planning
  • Point cloud generation from RGB cameras
  • Obstacle detection and distance estimation

Limitations

  • Produces relative (not metric) depth β€” requires calibration for absolute distances
  • Performance degrades on reflective, transparent, or textureless surfaces
  • Single-frame estimation β€” no temporal consistency for video
  • Inherits biases from training data distribution

Out of Scope

  • Safety-critical autonomous driving without additional validation
  • Medical depth estimation
  • Surveillance applications

Attribution

Citation

@article{yang2024depth_anything_v2,
  title={Depth Anything V2},
  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
  journal={arXiv preprint arXiv:2406.09414},
  year={2024}
}

Built with FORGE by RobotFlowLabs
Optimizing foundation models for real robots.

Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
0.3B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for robotflowlabs/depth-anything-v2-large

Finetuned
(3)
this model

Collection including robotflowlabs/depth-anything-v2-large

Paper for robotflowlabs/depth-anything-v2-large

Evaluation results