arxiv:2603.22341

T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search

Published on Mar 21

· Submitted by

Seanie Lee on Mar 26

KAIST AI

Upvote

Authors:

Hyomin Lee ,

Yumin Choi ,

Abstract

T-MAP, a trajectory-aware evolutionary search method, discovers adversarial prompts that bypass safety measures and achieve harmful outcomes through tool interactions in LLM agents.

AI-generated summary

While prior red-teaming efforts have focused on eliciting harmful text outputs from large language models (LLMs), such approaches fail to capture agent-specific vulnerabilities that emerge through multi-step tool execution, particularly in rapidly growing ecosystems such as the Model Context Protocol (MCP). To address this gap, we propose a trajectory-aware evolutionary search method, T-MAP, which leverages execution trajectories to guide the discovery of adversarial prompts. Our approach enables the automatic generation of attacks that not only bypass safety guardrails but also reliably realize harmful objectives through actual tool interactions. Empirical evaluations across diverse MCP environments demonstrate that T-MAP substantially outperforms baselines in attack realization rate (ARR) and remains effective against frontier models, including GPT-5.2, Gemini-3-Pro, Qwen3.5, and GLM-5, thereby revealing previously underexplored vulnerabilities in autonomous LLM agents.

View arXiv page View PDF GitHub 1 Add to collection

Community

Seanie-lee

Paper submitter about 18 hours ago

Red-teaming tool-calling agents

avahal

about 10 hours ago

lowkey the most interesting bit is how t-map couples cross-diagnosis with a learned tool call graph to steer an evolution toward realistic multi-step tool sequences. that memory of tool-to-tool transitions plus an 8x8 map-elites archive makes the search not just smarter but richer in trajectory diversity, which helps surface broader vulnerabilities instead of chasing a single win. btw the arxivlens breakdown helped me parse the method details, it does a nice job unpacking how diagnosis, graph updates, and mutation actually fit together: https://arxivlens.com/PaperView/Details/t-map-red-teaming-llm-agents-with-trajectory-aware-evolutionary-search-7241-98f1a91b
one question: how sensitive is arr to inaccuracies in the tool graph or noisy tool-usage data, and would a robustness tweak to the graph still preserve the gains against frontier models?