Shadow MCP: the inventory problem nobody owns
MCP made every developer an integrator and every hand-edited editor config a piece of unmanaged infrastructure. No CPE, no package record, no gateway log — and no tool in your stack that models it.
The Model Context Protocol is the best thing that has happened to AI agents' usefulness. One small config stanza wires an agent to a filesystem, a database, a browser, an internal API. That's the point of it, and it worked: every developer with an editor became an integrator.
Here's the part nobody planned for. Every one of those config stanzas is now a piece of infrastructure, and it lives outside every system of record you have.
Infrastructure that nothing registers
Consider the most ordinary case imaginable. A developer wants their coding agent to query a staging database, so they add an MCP server to an editor config by hand. Five lines of JSON, thirty seconds, works immediately. Multiply by every developer, every editor, every laptop.
Now try to account for it with the tooling you own:
- There's no CPE for it. Your vulnerability scanner models installed packages and known CVEs. A hand-edited JSON stanza is not a package. "Unpinned MCP server with filesystem scope" is a real exposure class, and there is no CVE identifier, no version database, no plugin that reports it.
- There's often no package at all. A typical entry launches
npx -y some-mcp-server, resolved from the registry at every launch. Nothing is durably installed for an inventory agent to find. The supply chain runs on demand, at whatever version the registry serves today. - There's no gateway record. Most MCP servers run as local child processes speaking stdio. The agent-to-tool call is a pipe between two processes on one machine. Your network gateway logs nothing because nothing crossed the network.
- There's no change ticket. No deployment pipeline, no review, no entry in the CMDB. The developer didn't circumvent your change process; the change process has no category for what they did.
Shadow IT took years to earn its name. Shadow MCP arrived in about eighteen months, and it's growing at the speed of AI-tool adoption, which is to say faster than any inventory process you currently run.
Every tool in the stack stops one layer short
The uncomfortable pattern, when you walk the stack, is that each control you already pay for does its job correctly and still misses this layer.
Your EDR sees the process, but not the tool call. To the process layer, an MCP server is a node process behaving normally. The fact that it's handing repository contents to an agent, and which agent, and under what scope, has no meaning at the level where EDR operates.
Your vulnerability scanner sees the package, but not the MCP config. It will faithfully flag a down-level Node runtime while having no schema for the unpinned server that runtime launches.
Your gateway sees routed traffic, but not the local agent. We covered this in the previous post: traffic that never leaves the laptop cannot be governed by a network chokepoint.
Your IdP sees the token, but not what the agent did with it. Authentication is intact; the invocation layer after authentication is invisible.
Your model vendors' admin consoles see their own stack, and your engineers run three. Per-vendor controls are good and worth keeping. None of them inventories the other vendors' agents, and none sees a hand-added server at all.
This is not a tooling failure, and naming it one would be unfair to tools built before the layer existed. The agent layer is new. It just happens to be new and unowned, which is the combination security teams get paged about two years later.
What an Agent Exposure Report actually surfaces
Our position is that the fix starts with inventory, at the endpoint, where agents actually run. That's what Beacon does on day one: a signed, user-space agent with no kernel driver, deployed read-only, observing rather than enforcing. Thirty minutes later, on a pilot cohort, you have an Agent Exposure Report. Concretely, it contains:
- Every AI agent and CLI on every endpoint, with versions and update status: Claude Code, Cursor, Codex, Copilot, and the ones nobody told you about.
- Every MCP server configuration those agents are wired to, including the hand-added ones no gateway or scanner has a record of.
- Hygiene and exposure flags: down-level tools, unpinned or unverified MCP servers, plaintext credentials sitting in agent configs, agents running outside sanctioned accounts.
- A policy baseline: your actual usage replayed against a sensible default policy, showing the violations you'd have caught this week if governance had been on.
Teams that run this discover agents and MCP servers they had no record of. Not because their people were hiding anything, but because no tool was looking.
A sensible day-one baseline
You don't need a policy framework to start; you need about five rules, run in monitor mode:
- Pin MCP server versions. No
latest, no barenpx -y. This is dependency hygiene applied to the one dependency class that currently escapes it. - Maintain a sanctioned-server list. Start permissive: flag off-list servers rather than blocking them, and let the flags drive the list.
- No plaintext credentials in agent configs. The report will show you where they are today.
- Sanctioned accounts only. Agents should run under the identities your IdP knows about.
- Monitor for two weeks before enforcing anything. See what policy would have done against real usage. Enforce per rule and per cohort once the false-positive rate is boring.
Nothing on that list slows a rollout down. Each item is the same hygiene you already apply to packages, secrets, and accounts, extended to the one layer that skipped onboarding.
Where the category goes
Two paragraphs of vocabulary, for those tracking how this space is shaping up. The work described above, inventory plus policy plus a tamper-evident audit trail over what agents may do, is starting to be called Agent Policy Governance (APG). It's the layer frameworks like OWASP's Agentic Security Initiative, the NIST AI RMF, and ISO/IEC 42001 all quietly presume exists when they ask you to prove what your agents are permitted to do and what they did.
Its counterpart, Agent Detection & Response (ADR), watches agent behavior for what no policy anticipated. Mature security categories tend to grow both muscles, and we expect this one will too. But detection presupposes a sensor on an inventory you trust, which is why the order matters: you cannot govern what you can't see, and until now nobody could see this layer.
Start with seeing. Get your Agent Exposure Report.