T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search
Abstract
T-MAP, a trajectory-aware evolutionary search method, discovers adversarial prompts that bypass safety measures and achieve harmful outcomes through tool interactions in LLM agents.
While prior red-teaming efforts have focused on eliciting harmful text outputs from large language models (LLMs), such approaches fail to capture agent-specific vulnerabilities that emerge through multi-step tool execution, particularly in rapidly growing ecosystems such as the Model Context Protocol (MCP). To address this gap, we propose a trajectory-aware evolutionary search method, T-MAP, which leverages execution trajectories to guide the discovery of adversarial prompts. Our approach enables the automatic generation of attacks that not only bypass safety guardrails but also reliably realize harmful objectives through actual tool interactions. Empirical evaluations across diverse MCP environments demonstrate that T-MAP substantially outperforms baselines in attack realization rate (ARR) and remains effective against frontier models, including GPT-5.2, Gemini-3-Pro, Qwen3.5, and GLM-5, thereby revealing previously underexplored vulnerabilities in autonomous LLM agents.
Community
Red-teaming tool-calling agents
lowkey the most interesting bit is how t-map couples cross-diagnosis with a learned tool call graph to steer an evolution toward realistic multi-step tool sequences. that memory of tool-to-tool transitions plus an 8x8 map-elites archive makes the search not just smarter but richer in trajectory diversity, which helps surface broader vulnerabilities instead of chasing a single win. btw the arxivlens breakdown helped me parse the method details, it does a nice job unpacking how diagnosis, graph updates, and mutation actually fit together: https://arxivlens.com/PaperView/Details/t-map-red-teaming-llm-agents-with-trajectory-aware-evolutionary-search-7241-98f1a91b
one question: how sensitive is arr to inaccuracies in the tool graph or noisy tool-usage data, and would a robustness tweak to the graph still preserve the gains against frontier models?
Get this paper in your agent:
hf papers read 2603.22341 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper