ONE-Lab/GUI-World
Preview • Updated • 3.93k • 40
This is the first VideoLLM with powerful GUI-oriented capabilities, retrained on GUI-World.
It was presented in GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents.
See Github for how to use GUI-Vid for GUI understanding tasks.