MCP-ITP: An Automated Framework for Implicit Tool Poisoning in MCP
Abstract
Implicit tool poisoning attacks in LLM agents manipulate behavior through malicious metadata without invoking poisoned tools, requiring automated frameworks for detection and mitigation.
To standardize interactions between LLM-based agents and their environments, the Model Context Protocol (MCP) was proposed and has since been widely adopted. However, integrating external tools expands the attack surface, exposing agents to tool poisoning attacks. In such attacks, malicious instructions embedded in tool metadata are injected into the agent context during MCP registration phase, thereby manipulating agent behavior. Prior work primarily focuses on explicit tool poisoning or relied on manually crafted poisoned tools. In contrast, we focus on a particularly stealthy variant: implicit tool poisoning, where the poisoned tool itself remains uninvoked. Instead, the instructions embedded in the tool metadata induce the agent to invoke a legitimate but high-privilege tool to perform malicious operations. We propose MCP-ITP, the first automated and adaptive framework for implicit tool poisoning within the MCP ecosystem. MCP-ITP formulates poisoned tool generation as a black-box optimization problem and employs an iterative optimization strategy that leverages feedback from both an evaluation LLM and a detection LLM to maximize Attack Success Rate (ASR) while evading current detection mechanisms. Experimental results on the MCPTox dataset across 12 LLM agents demonstrate that MCP-ITP consistently outperforms the manually crafted baseline, achieving up to 84.2% ASR while suppressing the Malicious Tool Detection Rate (MDR) to as low as 0.3%.
Community
The implicit-tool-poisoning setup is a great example of why agent security needs to look beyond the invoked tool. If malicious metadata can steer the agent into using a different high-privilege tool, then scanning only the final tool call misses the real attack path.
We are building Armorer Guard around that runtime boundary: local Rust scanner, structured JSON verdicts, Python support, and labels for prompt injection, sensitive-data request, exfiltration, safety bypass, destructive command, and system-prompt extraction. The goal is to score tool metadata, retrieved context, and proposed tool-call payloads before the agent acts.
Demo: https://huggingface.co/spaces/armorer-labs/armorer-guard-demo
Repo: https://github.com/ArmorerLabs/Armorer-Guard
A useful benchmark extension here would be measuring whether a guard catches the poisoned metadata before it becomes an apparently legitimate downstream tool invocation.
Get this paper in your agent:
hf papers read 2601.07395 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper