[Research]
The MCP supply chain: a new class of risk
An agent is only as trustworthy as the least trustworthy server it connects to. The Model Context Protocol has created a live software supply chain that most security programs have never inventoried, let alone governed. Here is why it is a distinct class of risk, and a model for containing it.
[Key takeaways]
- MCP turns every connected server into a link in your agent's supply chain, and agents inherit the union of every server's permissions.
- The threat is not a rogue app but transitive trust: a compromised or mutable server can act with all the access the agent holds.
- MCP shares the shape of classic package-registry risk (npm, PyPI) but is worse in three ways: descriptions drive behavior, servers are mutable after approval, and there is no integrity layer.
- Tool poisoning, tool shadowing, rug pulls, and token passthrough are not implementation bugs. They are consequences of the protocol's trust model.
- The answer is a governance model, not a blocklist: inventory, least-privilege scoping, continuous verification, and inline enforcement on the connection path.
Agents made permissions transitive
For a decade, security programs have reasoned about applications as bounded things. An app has a scope, a set of granted permissions, and a blast radius you can draw a box around. The Model Context Protocol breaks that model quietly. When an agent connects to an MCP server, it does not call a walled-off API. It loads that server's tools into its own context and can invoke them with the agent's own identity and access.
The practical consequence is that permissions became transitive. An agent that can read your repositories, hit your internal APIs, and touch a database now extends all of that reach to every server it connects to. Connect five servers and the agent operates with the union of what those five servers ask it to do, mediated only by a language model deciding which tool to call and when. The trust boundary is no longer the app. It is the entire graph of servers reachable from the agent.
This is why MCP is best understood as a supply chain rather than a set of integrations. Each server is a dependency. Each dependency can pull in behavior you did not write, did not review, and cannot see at the moment it executes. The industry spent years learning to reason about code dependencies. Agents introduced a second dependency graph, made of servers and tool descriptions, and almost no one is inventorying it.
Four failure modes baked into the trust model
The risks below are not exotic exploits or vendor bugs. They follow directly from how MCP asks clients to trust servers. The protocol defines how tools are described and invoked, but historically provided no native defense against a server that describes itself dishonestly or changes its behavior after you approve it.
Tool poisoning
A tool's description is not documentation. It is instruction that the model reads and acts on. A poisoned server embeds hidden directives in a tool description, for example telling the model to read a local SSH key or an environment file and pass its contents as an argument, while the tool presents itself as something innocuous like an addition helper. Users typically see a simplified summary in the client UI; the model sees the full description. That gap between what the human approves and what the model obeys is the attack surface.
Tool shadowing
In a multi-server session, tools share one context. A malicious server can publish a description that redefines how a trusted server's tool should behave, hijacking the agent without the attacker's own tool ever being called by name. The compromise rides on a legitimate tool, which makes it nearly invisible in any per-tool review.
Rug pulls (mutable servers)
Approval is a point-in-time act. MCP clients have generally not pinned versions or verified integrity, so a server that was benign when you approved it can change its tool descriptions later and inject new behavior. Initial trust does not guarantee ongoing safety. This is the same class of problem as a package that ships clean and pushes a malicious patch update, except here the payload is natural-language instruction the model will follow.
Token passthrough and the confused deputy
When a server accepts a token from the client and forwards it to a downstream API without confirming the token was actually issued for that server, it becomes a confused deputy: a component with legitimate access being steered by an attacker to act on their behalf. The MCP authorization spec explicitly forbids token passthrough for exactly this reason, yet implementations still log, expose, or forward tokens to services that have no business seeing them, enabling privilege escalation and lateral movement across trust boundaries.
Like npm and PyPI, only harder
Security teams already have muscle memory for software supply chain risk. Typosquatting, dependency confusion, and malicious post-install scripts on npm and PyPI are well-worn ground. MCP rhymes with all of it. Public catalogs already list many thousands of servers with minimal vetting, the official registry is a metaregistry of pointers rather than audited code, duplicate and near-duplicate entries are rampant, and typosquatting a server name is entirely feasible. Most servers install dependencies at runtime, so every startup re-inherits the risk of the underlying npm or PyPI package being compromised. So far, so familiar.
But three differences make MCP a genuinely new class of risk rather than a relabeling of the old one.
First, the artifact is behavior, not just code. A malicious npm package has to run to do harm. A malicious MCP server can do harm through a description alone, because the description is executed by a model as intent. You cannot fully assess a server by reading its source; you have to reason about what its tool text will make an agent do.
Second, the dependency is mutable and remote. A pinned package hash is stable. A hosted MCP server can change what it exposes between two invocations, with no version bump you would notice. The rug pull is not a supply chain edge case here; it is the default posture unless a client enforces pinning and integrity, which many do not.
Third, the blast radius is the agent's full authority. A compromised dev dependency runs with the developer's local permissions at build time. A compromised MCP server runs with everything the agent can reach at any time it is connected: repositories, internal services, credentials, and every other tool in the session. The amplification is structural.
Trust is not a property of a server
The instinct is to solve this with a list: an allowlist of approved servers, a blocklist of bad ones. That instinct is where most programs get stuck, because it treats trust as a static attribute you assign once. In a world of mutable, remote, description-driven servers, trust is a property of a specific connection at a specific moment, not of a server in the abstract.
A server that passed review last month may have changed its tool descriptions this morning. A server that is safe for a low-privilege agent may be dangerous for one that can reach production. The same server can be fine in one session and a liability in another purely because of which other tools share the context. Any governance model that resolves trust to a yes-or-no verdict at onboarding time will be wrong the moment something changes, and something always changes.
The durable framing is closer to zero trust than to allowlisting. Assume any server can be compromised or can turn hostile after approval. Grant the narrowest access that lets an agent do its job. Verify continuously rather than once. And keep enforcement on the path the connection actually travels, so a change in a server's behavior is caught inline rather than in a quarterly review. That is the model the next section lays out.
Inventory the graph
You cannot govern a supply chain you cannot see. Discover every MCP server every agent connects to, across IDEs, CLIs, and desktop apps, and map the tools and access each one pulls in. Treat the server list as a living dependency graph, not a one-time survey.
Scope to least privilege
Resolve trust per connection, not per server. Gate which servers an agent may load based on the agent's own privilege and the sensitivity of its session. Deny token passthrough, scope downstream tokens to the specific service and user, and keep high-authority agents on a short list of vetted servers.
Verify continuously
Approval is not a one-time event. Watch for changed tool descriptions, new tools appearing mid-session, and cross-server shadowing. Re-check integrity on every connection so a rug pull is caught when it happens, not after it has been exploited.
Enforce inline
Put controls on the connection path itself. Inspect tool calls and their arguments as they flow, redact secrets and PII before they leave, block unapproved or high-risk servers at connect time, and produce evidence mapped to ISO 42001, the EU AI Act, SOC 2, and ISO 27001.
Four control domains for the MCP supply chain
The four domains above compose into a single governance model. They are ordered deliberately: you cannot scope what you have not inventoried, you cannot verify what you have not scoped, and enforcement is only credible when it sits on top of the other three. Read together, they turn the abstract principle of per-connection trust into something operable.
Inventory gives you the dependency graph. Every agent, every server it reaches, every tool and every access grant those tools imply. This is the MCP-specific extension of shadow AI discovery, and it is the foundation for everything else. For the hands-on rubric on how to evaluate a single server before it connects, see the companion guide on vetting MCP servers.
Least-privilege scoping turns the inventory into policy. The question is never simply "is this server trusted," it is "should this agent, in this context, be allowed to load this server." High-authority agents get a narrow, vetted set. Experimental agents can range wider in a sandbox. Downstream tokens are exchanged and scoped, never passed through.
Continuous verification accepts that servers change. It re-checks tool descriptions and integrity on every connection, flags net-new tools and shadowing attempts, and treats a description that changed since approval as a signal, not noise. This is what defeats the rug pull.
Inline enforcement is where the model earns its keep. A transparent proxy on the connection path can inspect prompts and tool traffic as they happen, redact secrets and PII inline, risk-score and gate servers at connect time, and block unapproved models and tools without waiting for a human review cycle. Because detection runs locally on the endpoint, nothing has to leave the network for the control to work. This is exactly the posture Cerbera was built to provide: inventory, scoping, verification, and enforcement on one path, with evidence that maps straight to your compliance frameworks.