-
Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control
Paper • 2506.01943 • Published • 25 -
LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks
Paper • 2506.00411 • Published • 31 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 151
Ron Zhu
RzZ
AI & ML interests
None yet
Organizations
None yet
VLM
-
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Paper • 2312.15715 • Published • 20 -
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
Paper • 2505.23747 • Published • 69 -
VideoPrism: A Foundational Visual Encoder for Video Understanding
Paper • 2402.13217 • Published • 38 -
Scaling RL to Long Videos
Paper • 2507.07966 • Published • 160
Robotic
-
Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control
Paper • 2506.01943 • Published • 25 -
LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks
Paper • 2506.00411 • Published • 31 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 151
VLM
-
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Paper • 2312.15715 • Published • 20 -
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
Paper • 2505.23747 • Published • 69 -
VideoPrism: A Foundational Visual Encoder for Video Understanding
Paper • 2402.13217 • Published • 38 -
Scaling RL to Long Videos
Paper • 2507.07966 • Published • 160
models 11
RzZ/Qwen2.5-VL-3B-GGUF
3B • Updated
• 23
RzZ/Qwen2.5-VL-32B-Instruct-GGUF
0.7B • Updated
• 8
RzZ/sd-v1-4-adapter-seg
Updated
• 1
RzZ/sd-v1-4-adapter-depth
Updated
• 1
RzZ/sd-v1-4-adapter-keypose
Updated
• 2
RzZ/sd-v1-4-adapter-color
Updated
• 2
RzZ/sd-v1-4-adapter-canny
Updated
RzZ/sd-v1-4-adapter-sketch
Updated
RzZ/sd-v1-4-adapter-openpose
Updated
RzZ/sd-v1-4-adapter-keypose-depth
Updated
• 1