Kane Narraway, in his May 2026 post You Can't Bootstrap Trust, made an argument about endpoints that applies almost word for word to AI tools.
The original argument: trust at the endpoint isn't a single control. It's a chain. Components from a known supplier, asset inventory enrollment before user access, MDM tied to a specific person, baseline hardening. If any link is weak, the whole thing collapses. The failure mode he names is "trust where none exists": a personal MacBook with open MDM enrollment that appears managed in the console without anyone verifying the device is the device.
Now replace MacBook with MCP server. The same pattern shows up across every team running AI agents in production. A proxy at the front. A nice dashboard. A vendor with green status indicators. None of which add up to a working trust chain when the substrate is unverified.
This post is about where the AI trust chain breaks in practice, and what an end-to-end version looks like.
The chain that has to hold for an AI agent
Walking from the human prompt to the API call the agent makes, there are at least seven links. Each one is a separate control. Each has to hold for the chain to mean anything.
- The model. Provenance verified. Weights checked against a signed manifest. Anthropic's whitepaper notes that injecting 250 malicious documents has been shown to backdoor LLMs from 600M to 13B parameters; the backdoor persists through safety training. If you can't verify the model came from where you think, the rest is downstream of an unknown.
- The agent runtime. Sandboxed at the OS level. File system and network isolation. Identity-based, not just network-based. Anthropic's framework treats sandboxing as table stakes for any agent processing untrusted input.
- The MCP server (if one is in the path). Self-hosted on immutable infrastructure. Signed. Verified before each update. The first documented in-the-wild malicious MCP server impersonated a legitimate email service and silently copied every sent message.
- The credential the agent presents. Per-agent, never shared. Short-lived. Hardware-bound where possible. Never written to a config file.
- The scope the credential carries. Declared in advance, narrow, time-bound. Not "read all of Gmail." Something closer to "read the last 24 hours of messages from this address."
- The audit log entry the action produces. Attributes the action to a specific agent instance, captures the session intent, names the triggering origin. Not just a service-account API call.
- The resource the action hits. Accepts the credential, applies its own access policy at the receiving end, logs what happened on its side.
Each link is a control. Skip any one and the rest don't compensate.
Where the chain breaks in practice
The most common breaks across the teams running AI agents today.
Unsigned MCP servers from public registries. The Cursor or Claude Code team adds a new tool by pulling an MCP server from a GitHub repo. Nobody verifies the server's signature because nobody asked for one. The server runs as a privileged identity inside the team's network. If the upstream maintainer changes intent or gets compromised, every agent using the server inherits the change.
Static API keys in config files. The agent authenticates with a long-lived token committed to a config file checked into git. Anthropic puts it bluntly: "Static API keys and shared service-account passwords are among the first things an attacker with model-assisted code analysis will find." The rotation policy is meaningless because the key is grep-able and the model assisting the attacker is faster than the rotation cadence.
Service accounts shared across agents. One claude-prod-mcp account performs 1,400 database reads and 47 writes this week. Nobody can say which agent did what. Every higher-layer control inherits this ambiguity; the audit log can't tell three indistinguishable scenarios apart.
OAuth tokens with no enforced scope at the resource. The IdP issues a token. The SaaS API receives it. The SaaS API only knows whether the token is valid, not whether the action is in scope for the session. The whole runtime check happens upstream of the resource and is invisible to the resource.
Audit logs that stop at the service account. The log says service_account_x performed action_y at time_z. There is no chain back to the human prompt or the session intent. A reviewer six months later sees a sequence of API calls and can reconstruct neither cause nor reason.
These five breaks share a property. They are upstream of whatever proxy or gateway the team has bolted on. The proxy can't see them, can't fix them, and can't compensate for them.
The proxy pattern as bootstrap trust
The most common shape of "AI governance" right now is a proxy or gateway in front of the agent. Prompt pattern matching, tool call logging, some allowlisting. These are useful, but they are exactly the layer Kane warned about: a control that depends on every user routing through it, every agent respecting it, and the underlying credentials being hidden behind it.
What Kane wrote about endpoint proxies translates directly. "Each [missing foundational control] is a separate control that has to hold for the proxy to be meaningful." Without application allowlisting at the endpoint, an engineer can install a different AI tool. Without API key hygiene, an engineer can grab a key and pipe it into something that doesn't go through the proxy. Without identity-bound credentials, a leaked token works just as well outside the proxy as inside.
The proxy is one control in a chain. Treating it as the chain is bootstrap trust.
What end-to-end looks like
Each link in the chain has a control that survives the question Anthropic asks across the whitepaper: does this make the attack impossible, or just tedious?
- Model integrity: verified weights, attested execution environment. Impossible to substitute, not just tedious.
- Runtime: sandbox at the OS level, identity-isolated. Reaches across only the named services, not all the services the network would allow.
- MCP servers: self-hosted on immutable infrastructure, signed releases, verified before each update. Not "we trust the upstream maintainer," but "we verify each release."
- Credentials: per-agent, ephemeral, hardware-bound where reachable. An attacker who compromises the runtime finds no cached credentials to steal.
- Scope: declared, narrow, time-bound. The credential carries less than the agent could theoretically need.
- Audit: attribution chain from human prompt or scheduled trigger through to the API call. Three indistinguishable scenarios become distinguishable.
- Resource: enforces its own access policy. The chain doesn't end at the API gateway; it ends at the database row.
None of these are exotic. Most are in the Anthropic whitepaper at the Foundation or Enterprise tier. What's exotic is having all of them present at once, on the same agent, with the chain documented and tested.
The minimum a small team can stand up
The Kane caveat is real. Most teams will not stand up a complete chain in one quarter. The pragmatic question is which links to harden first.
Three that close the most common breaks without requiring SPIFFE or a CA.
The first is MCP server provenance. Self-host any MCP server that touches production. Sign it. Verify it before each update. Treat each new server like a new third-party API: review before it goes live, not after.
The second is per-agent credentials. Stop using shared service tokens for agents. Each agent instance gets its own credential, scoped to what it needs, time-bounded. Use OAuth client credentials or STS-style brokering, whichever fits your stack. The credential never lives in a config file.
The third is identity-layer registration. Make every agent a first-class entity in the IdP. Owner, lifecycle, scope manifest, retirement date. The same machinery that handles human users handles agents. When the engineer who built the agent leaves, the agent's identity is part of the offboarding flow.
These three don't get you to the Advanced tier. They close the most common ways the chain breaks today.
What apparent governance looks like
The honest test for whether your trust chain holds is harder than reviewing a dashboard. Pick one action your agent performed last week. Try to reconstruct, in audit data you actually own, every link from the human prompt to the API call. The human's identity. The session intent. The agent's identity. The credential issued. The scope it carried. The action taken. The resource hit. The result.
If any link is opaque or assumed, the chain is broken there. The dashboard's green status doesn't fill it in. The vendor's claim doesn't fill it in. The proxy's log entry doesn't fill it in.
Apparent governance is the steady state for teams that don't run that test. End-to-end trust is what happens when the test is part of the operating cadence.