|8 min read|BreakMyAgent Team

Agent-to-Agent Prompt Injection: The Attack Surface Nobody Is Testing

As multi-agent systems become standard, a new injection surface is opening up. Most agent security tests focus on user input. Here is why that misses the bigger threat.

agent-to-agent injectionmulti-agent securityprompt injection 2026AI agent attack surfaceMCP security

Agent-to-Agent Prompt Injection: The Attack Surface Nobody Is Testing

Most teams testing AI agent security focus on the obvious: can a user manipulate the agent through chat input? That is the right place to start. It is not where the most dangerous injections are happening in 2026.

As multi-agent architectures become standard, a new attack surface has opened. Agents communicating with other agents create injection paths that bypass every defense designed for direct user input.

How single-agent and multi-agent security differ

In a single-agent deployment, the trust model is relatively clear. The system prompt is trusted. User input is untrusted. The agent tries to follow instructions from the system prompt while resisting instructions embedded in user input.

In a multi-agent deployment, that trust model breaks down.

Agent A might orchestrate Agents B, C, and D. Agent A trusts B because B is part of the workflow. B trusts A for the same reason. If an attacker can influence what B says to A, they can inject instructions into A's context through a channel that A treats as trusted.

This is not theoretical. It is the mechanism behind several real-world agent security incidents, including the indirect prompt injection attacks documented in Google's Threat Intel reports from April 2026.

The five agent-to-agent injection surfaces

1. Orchestrator-to-subagent instructions

Orchestrators typically pass task descriptions and context to subagents. If an attacker can modify the data the orchestrator reads before it generates these instructions, they can shape what gets sent downstream. Tool output poisoning (feeding malicious content into a tool's response) is the classic way to do this.

2. Subagent-to-orchestrator results

When a subagent reports results back to an orchestrator, the orchestrator typically trusts that output. An attacker who can influence a subagent's output can inject instructions upward in the agent hierarchy.

3. MCP server responses

MCP (Model Context Protocol) servers expose tools to agents. A compromised or malicious MCP server can inject instructions into the model's context through tool call responses. The model receives what looks like structured data. Inside that data is a prompt injection.

4. External content processed by any agent in a chain

One agent reads a document. Another agent processes the results. If the document contains an injection, it travels through the pipeline. The second agent never directly touched user input, but it is processing a sanitized-looking output that contains adversarial instructions.

5. Shared memory or context

Some multi-agent systems share a memory store. An injection that writes malicious instructions into shared memory can affect any agent that reads from it later. This is persistent injection: the attacker writes it once, and it fires on every subsequent read.

Why existing defenses do not cover this

Input validation and injection classifiers are typically deployed at the agent boundary. They check what comes in from users. They do not check what comes in from other agents, because that is the trusted channel.

Running a full injection classifier on every inter-agent message is expensive and creates latency. Some teams do it. Most do not.

The result: the hardest attacks are coming through the least-defended channel.

What actually works

Treat every external input as untrusted, including inputs from other agents. A message from Agent B should be processed with the same skepticism as a message from a user. If your framework does not let you enforce this cleanly, it is a framework-level design issue worth surfacing.

Restrict tool permissions per agent. The orchestrator probably needs write access to more systems than the subagent does. Apply least privilege at the tool level, not just at the system level.

Log inter-agent messages and anomalies. You cannot respond to attacks you cannot see. Logging what agents say to each other, and alerting on unusual patterns (sudden requests for privileged operations, unexpected tool calls, outputs that do not match expected schemas), is the minimum observability you need.

Test the inter-agent surface, not just the user-facing one. BreakMyAgent's scanner can be run against the outputs of agent pipelines, not just raw user inputs. If Agent B produces outputs that Agent A will consume, scan B's outputs for injection patterns too.

The bottom line

Multi-agent systems are the direction the industry is moving. Orchestrators spawning subagents, agents calling MCP tools, agents handing off tasks to other agents. Every one of those handoffs is an injection surface.

Security testing needs to keep up. Test your agents with BreakMyAgent and include inter-agent communication in your threat model.

Test your system prompt

Apply what you just read. Scan your agent prompt for these vulnerabilities.

Scan My Agent
Scan Agent