DarthZhu/VideoRLVR-Data
Preview • Updated • 15
How to use DarthZhu/VideoRLVR-Wan2.2 with Diffusers:
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline
from diffusers.utils import load_image, export_to_video
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("DarthZhu/VideoRLVR-Wan2.2", dtype=torch.bfloat16, device_map="cuda")
pipe.to("cuda")
prompt = "A man with short gray hair plays a red electric guitar."
image = load_image(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png"
)
output = pipe(image=image, prompt=prompt).frames[0]
export_to_video(output, "output.mp4")VideoRLVR is a reinforcement learning (RL) recipe for training video reasoning models with verifiable rewards. This model is a reinforcement-learning optimized version of Wan2.2-TI2V-5B, presented in the paper Video Models Can Reason with Verifiable Rewards.
The model uses an SDE-GRPO optimization backbone and rule-based feedback to improve visual reasoning in complex, procedurally generated tasks such as Maze, FlowFree, and Sokoban.
VideoRLVR formulates video reasoning as the generation of verifiable visual trajectories. Key components include:
@article{zhu2026video,
title={Video Models Can Reason with Verifiable Rewards},
author={Tinghui Zhu and Sheng Zhang and James Y. Huang and Selena Song and Xiaofei Wen and Yuankai Li and Hoifung Poon and Muhao Chen},
journal={arXiv preprint arXiv:2605.15458},
year={2026}
}