NITP: Next Implicit Token Prediction for LLM Pre-training Paper • 2605.24956 • Published 11 days ago • 31
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published 9 days ago • 135
LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs? Paper • 2605.08985 • Published 26 days ago • 22
GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF Image-Text-to-Text • 35B • Updated 20 days ago • 2.54k • 3
WebWorld: A Large-Scale World Model for Web Agent Training Paper • 2602.14721 • Published Feb 16 • 19