Towards Pixel-level VLM Perception via Simple Points Prediction
Tianhui Song
sthui
AI & ML interests
None yet
Recent Activity
upvoted a paper 2 days ago
HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers upvoted a paper 13 days ago
Representation Forcing for Bottleneck-Free Unified Multimodal Models authored a paper 4 months ago
Kimi K2.5: Visual Agentic Intelligence