Abstract
Generative UI models enable personal agents to synthesize dynamic interfaces with lightweight executable actions for enhanced interaction beyond text-only formats.
As personal agents evolve to handle complex, user-centric tasks, static plain-text chat is rapidly becoming a bottleneck. Generative UI emerges as the necessary new interface layer, dynamically synthesizing the right controls, options, and state from the interaction context in real time. We present Macaron-A2UI, a model for Generative UI in personal agents. Our goal is to move beyond text-only interaction by enabling agents to generate natural language together with lightweight, executable UI actions for information collection, preference refinement, confirmation, and multi-goal organization. We build a large-scale Generative UI corpus from heterogeneous dialogue sources, introduce A2UI-Bench for controlled evaluation, and train 30B, 235B and 754B models with parameter-efficient LoRA-based supervised fine-tuning followed by reward-driven reinforcement learning. The best Macaron-A2UI model reaches 75.6 overall on A2UI-Bench without explicit schema hints, surpassing the strongest full-schema frontier baseline. We release the models, benchmark, and evaluation protocol to support future work on Generative UI for personal agents.
Community
Macaron-A2UI: A Model for Generative UI in Personal Agents
Interesting work!
the fact you can hit 75.6 on a2ui-bench without explicit schema hints is pretty striking. that schema-light training recipe, with loRA-sft followed by reward-driven rl, basically lets the model learn to generate executable ui alongside natural language. i’d love to see an ablation where you cut the rl reward model entirely and rely only on supervised fine-tuning — my hunch is rl is doing most of the heavy lifting for action validity and safety. edge cases where controls differ across apps or safety policies kick in could expose brittleness in the generated widgets. btw, arxivlens had a solid breakdown that helped me parse the method details: https://arxivlens.com/PaperView/Details/macaron-a2ui-a-model-for-generative-ui-in-personal-agents-495-62505cf9 do you plan to publish an ablation on rl vs sft and test true cross-app robustness in a follow-up?
Get this paper in your agent:
hf papers read 2605.24830 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 3
mindlab-research/Macaron-A2UI-Grande
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper