ganatrask/NOVA
Viewer • Updated • 11.6k • 782
How to use ganatrask/NOVA with Transformers:
# Load model directly
from transformers import Gr00tN1d6
model = Gr00tN1d6.from_pretrained("ganatrask/NOVA", dtype="auto")How to use ganatrask/NOVA with LeRobot:
NOVA (Neural Open Vision Actions) is a fine-tuned version of NVIDIA's GR00T N1.6 vision-language-action model, trained specifically for Pollen Robotics' Reachy 2 humanoid robot.
This model is part of an end-to-end Physical AI pipeline that combines:
| Property | Value |
|---|---|
| Base Model | nvidia/GR00T-N1.6-3B |
| Parameters | ~3B |
| Embodiment | Reachy 2 (custom embodiment tag) |
| Action Space | 8-DOF (7 arm joints + gripper) |
| Training Steps | 30,000 |
| Final Loss | ~0.008-0.01 |
action = [
shoulder_pitch, # -180° to 90°
shoulder_roll, # -180° to 10°
elbow_yaw, # -90° to 90°
elbow_pitch, # -125° to 0°
wrist_roll, # -100° to 100°
wrist_pitch, # -45° to 45°
wrist_yaw, # -30° to 30°
gripper, # 0 (closed) to 1 (open)
]
This model is designed for:
Trained on the ganatrask/NOVA dataset:
| Parameter | Value |
|---|---|
| GPU | NVIDIA A100-SXM4-80GB |
| GPUs | 2 |
| Batch Size | 64 |
| Max Steps | 30,000 |
| Save Steps | 3,000 |
| Video Backend | decord |
python -m gr00t.train \
--dataset_repo_id ganatrask/NOVA \
--embodiment_tag reachy2 \
--video_backend decord \
--num_gpus 2 \
--batch_size 64 \
--max_steps 30000 \
--save_steps 3000 \
--output_dir ./checkpoints/groot-reachy2
You need to apply a patch to Isaac-GR00T to add the Reachy 2 embodiment tag:
cd Isaac-GR00T
patch -p1 < ../patches/add_reachy2_embodiment.patch
from gr00t.data.embodiment_tags import EmbodimentTag
from gr00t.policy.gr00t_policy import Gr00tPolicy
import importlib.util
# Load modality config first
spec = importlib.util.spec_from_file_location(
"modality_config",
"configs/reachy2_modality_config.py"
)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
# Load policy
policy = Gr00tPolicy(
embodiment_tag=EmbodimentTag.REACHY2,
model_path="ganatrask/NOVA", # or local checkpoint path
device="cuda",
strict=True,
)
# Run inference
obs = {
"video": {"front_cam": image[None, None, :, :, :]}, # (1, 1, H, W, 3)
"state": {"arm_joints": joints[None, None, :]}, # (1, 1, 7)
"language": {"annotation.human.task_description": [["Pick up the red cube"]]},
}
action, _ = policy.get_action(obs)
| Metric | Value |
|---|---|
| Inference Speed | ~40ms/step (A100) |
| VRAM Usage | ~44GB / 80GB |
| Training Time | ~6 hours (30K steps) |
If you use this model, please cite:
@misc{nova2025,
title={NOVA: Neural Open Vision Actions},
author={ganatrask},
year={2025},
publisher={HuggingFace},
url={https://huggingface.co/ganatrask/NOVA}
}
This model inherits the NVIDIA Open Model License from the base GR00T N1.6 model.
Base model
nvidia/GR00T-N1.6-3B