AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents
Abstract
Long-horizon GUI agents face challenges with interaction memory, which this work addresses through a diagnostic framework and anchored memory approach that improves task completion rates.
Long-horizon GUI agents are a key step toward real-world deployment, yet effective interaction memory under prevailing paradigms remains under-explored. Replaying full interaction sequences is redundant and amplifies noise, while summaries often erase dependency-critical information and traceability. We present AndroTMem, a diagnostic framework for anchored memory in long-horizon Android GUI agents. Its core benchmark, AndroTMem-Bench, comprises 1,069 tasks with 34,473 interaction steps (avg. 32.1 per task, max. 65). We evaluate agents with TCR (Task Complete Rate), focusing on tasks whose completion requires carrying forward critical intermediate state; AndroTMem-Bench is designed to enforce strong step-to-step causal dependencies, making sparse yet essential intermediate states decisive for downstream actions and centering interaction memory in evaluation. Across open- and closed-source GUI agents, we observe a consistent pattern: as interaction sequences grow longer, performance drops are driven mainly by within-task memory failures, not isolated perception errors or local action mistakes. Guided by this diagnosis, we propose Anchored State Memory (ASM), which represents interaction sequences as a compact set of causally linked intermediate-state anchors to enable subgoal-targeted retrieval and attribution-aware decision making. Across multiple settings and 12 evaluated GUI agents, ASM consistently outperforms full-sequence replay and summary-based baselines, improving TCR by 5%-30.16% and AMS by 4.93%-24.66%, indicating that anchored, structured memory effectively mitigates the interaction-memory bottleneck in long-horizon GUI tasks. The code, benchmark, and related resources are publicly available at [https://github.com/CVC2233/AndroTMem](https://github.com/CVC2233/AndroTMem).
Community
We present AndroTMem, a diagnostic framework for anchored memory in long-horizon Android GUI agents, together with AndroTMem-Bench, a benchmark that evaluates memory via TCR on dependency-critical long-horizon tasks. Also, We propose Anchored State Memory (ASM), which organizes history into causally linked intermediate-state anchors for targeted retrieval and attribution.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- AmbiBench: Benchmarking Mobile GUI Agents Beyond One-Shot Instructions in the Wild (2026)
- ANCHOR: Branch-Point Data Generation for GUI Agents (2026)
- Hybrid Self-evolving Structured Memory for GUI Agents (2026)
- MAGNET: Towards Adaptive GUI Agents with Memory-Driven Knowledge Evolution (2026)
- MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments (2026)
- AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications (2026)
- Efficient Long-Horizon GUI Agents via Training-Free KV Cache Compression (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper