What does 'bootstrap trust' mean for AI tools?

Kane Narraway's framing, originally about endpoints in 'You Can't Bootstrap Trust' (kanenarraway.com, May 2026): building security layers on top of unverified foundations creates apparent trust without actual verification. Applied to AI: a proxy layer evaluating prompts on top of unsigned MCP servers, shared API keys, and unverified models gives you a feeling of governance. The chain has gaps that make the controls bypassable in practice.

Where does the trust chain break for an AI agent?

Five common breaks. (1) Unsigned MCP servers from public registries. (2) Static API keys checked into config files. (3) Service accounts shared across agents. (4) OAuth tokens with no enforced scope at the resource. (5) Audit logs that attribute every action to a generic service account. Any one of these makes the rest of the controls cosmetic. The first documented in-the-wild malicious MCP server impersonated a legitimate email service and silently copied every sent message; the controls on top of it didn't matter.

How does an AI proxy or gateway fit this picture?

Proxies and gateways are useful but they are exactly the layer Kane warned about: a control that depends on every user routing through it. Engineers who don't want to wait will install the agent on a personal device, pipe an API key directly into it, or run the same model locally. Kane's specific note from the post: 'A common pattern is writing a script that pulls session cookies straight out of the browser's local storage.' The proxy stops working the moment someone with admin on their own laptop decides it does.

What does end-to-end trust look like for AI agents?

Every link verified. Models from a known provenance with cryptographic signing. MCP servers self-hosted on immutable infrastructure, signed, verified before each update. Per-agent credentials with no shared service accounts. Hardware-bound credentials for production. Application allowlisting at the endpoint so unapproved AI apps don't run. Identity-based isolation at every service the agent reaches. Each link is a separate control. Skip one and the rest don't compensate.

What's the relationship to the SPIFFE conversation?

SPIFFE is one piece of the substrate: cryptographic identity rooted in hardware. It makes the 'who is this agent' question answerable in a way that OAuth tokens and API keys can't. Without it, every higher-layer control inherits the weakness of the credential. With it, the higher layers have something real to bind their policies to. Kane's pragmatic note: smaller orgs will likely fall back to long-lived API tokens rather than implement SPIFFE; intent-bound governance reduces the blast radius of that fallback while teams work toward the right substrate.

Are proxies and gateways useless?

No. They're useful as one control among many. The mistake is treating them as the control. They work for the cooperative case (employees who route through them, agents that respect them). They fail the moment the path of least resistance bypasses them. Build them, instrument them, but don't write 'we've solved AI governance' on the same line as 'we deployed a proxy.'

What does 'apparent governance' look like in practice?

A dashboard with green status indicators. A policy document signed off by legal. A vendor's claim that their tool is enterprise-ready. None of these are the same as a working trust chain. The test: pick one action your agent performed last week and try to reconstruct, in audit data you own, every link from the human prompt to the API call. If any link is opaque or assumed, the chain is broken there. Apparent governance is the steady state when nobody runs that test.

What's the minimum chain a small team can stand up?

Three things they probably aren't doing. (1) Self-host any MCP server that touches production. Sign it. Verify it before each update. (2) Stop using shared service tokens for agents. Each agent gets its own credential, scoped, time-bounded, never written to disk. (3) Make agent identity a first-class entity in the IdP. Same lifecycle as a human user: created, owned, reviewed, retired. None of these require SPIFFE. All of them close the most common chain breaks.

How does this connect to compliance?

SOC 2 CC6 asks for controls over access. ISO 27001 A.5.16-A.5.19 covers identity lifecycle. HIPAA §164.308 addresses access management. Each presumes a coherent trust chain that the auditor can trace. A bootstrapped trust model produces inconsistent evidence: the dashboard says one thing, the audit log says another, the actual access pattern reveals a third. Sample the audit log on a random Tuesday and the auditor can tell within an hour whether the chain holds.

Bootstrap trust in AI agents: why layering security on insecure substrate fails, and what end-to-end actually looks like

Kane Narraway, in his May 2026 post You Can't Bootstrap Trust, made an argument about endpoints that applies almost word for word to AI tools.

The original argument: trust at the endpoint isn't a single control. It's a chain. Components from a known supplier, asset inventory enrollment before user access, MDM tied to a specific person, baseline hardening. If any link is weak, the whole thing collapses. The failure mode he names is "trust where none exists": a personal MacBook with open MDM enrollment that appears managed in the console without anyone verifying the device is the device.

Now replace MacBook with MCP server. The same pattern shows up across every team running AI agents in production. A proxy at the front. A nice dashboard. A vendor with green status indicators. None of which add up to a working trust chain when the substrate is unverified.

This post is about where the AI trust chain breaks in practice, and what an end-to-end version looks like.

The chain that has to hold for an AI agent

Walking from the human prompt to the API call the agent makes, there are at least seven links. Each one is a separate control. Each has to hold for the chain to mean anything.

The model. Provenance verified. Weights checked against a signed manifest. Anthropic's whitepaper notes that injecting 250 malicious documents has been shown to backdoor LLMs from 600M to 13B parameters; the backdoor persists through safety training. If you can't verify the model came from where you think, the rest is downstream of an unknown.
The agent runtime. Sandboxed at the OS level. File system and network isolation. Identity-based, not just network-based. Anthropic's framework treats sandboxing as table stakes for any agent processing untrusted input.
The MCP server (if one is in the path). Self-hosted on immutable infrastructure. Signed. Verified before each update. The first documented in-the-wild malicious MCP server impersonated a legitimate email service and silently copied every sent message.
The credential the agent presents. Per-agent, never shared. Short-lived. Hardware-bound where possible. Never written to a config file.
The scope the credential carries. Declared in advance, narrow, time-bound. Not "read all of Gmail." Something closer to "read the last 24 hours of messages from this address."
The audit log entry the action produces. Attributes the action to a specific agent instance, captures the session intent, names the triggering origin. Not just a service-account API call.
The resource the action hits. Accepts the credential, applies its own access policy at the receiving end, logs what happened on its side.

Each link is a control. Skip any one and the rest don't compensate.

Where the chain breaks in practice

The most common breaks across the teams running AI agents today.

Unsigned MCP servers from public registries. The Cursor or Claude Code team adds a new tool by pulling an MCP server from a GitHub repo. Nobody verifies the server's signature because nobody asked for one. The server runs as a privileged identity inside the team's network. If the upstream maintainer changes intent or gets compromised, every agent using the server inherits the change.

Static API keys in config files. The agent authenticates with a long-lived token committed to a config file checked into git. Anthropic puts it bluntly: "Static API keys and shared service-account passwords are among the first things an attacker with model-assisted code analysis will find." The rotation policy is meaningless because the key is grep-able and the model assisting the attacker is faster than the rotation cadence.

Service accounts shared across agents. One claude-prod-mcp account performs 1,400 database reads and 47 writes this week. Nobody can say which agent did what. Every higher-layer control inherits this ambiguity; the audit log can't tell three indistinguishable scenarios apart.

OAuth tokens with no enforced scope at the resource. The IdP issues a token. The SaaS API receives it. The SaaS API only knows whether the token is valid, not whether the action is in scope for the session. The whole runtime check happens upstream of the resource and is invisible to the resource.

Audit logs that stop at the service account. The log says service_account_x performed action_y at time_z. There is no chain back to the human prompt or the session intent. A reviewer six months later sees a sequence of API calls and can reconstruct neither cause nor reason.

These five breaks share a property. They are upstream of whatever proxy or gateway the team has bolted on. The proxy can't see them, can't fix them, and can't compensate for them.

The proxy pattern as bootstrap trust

The most common shape of "AI governance" right now is a proxy or gateway in front of the agent. Prompt pattern matching, tool call logging, some allowlisting. These are useful, but they are exactly the layer Kane warned about: a control that depends on every user routing through it, every agent respecting it, and the underlying credentials being hidden behind it.

What Kane wrote about endpoint proxies translates directly. "Each [missing foundational control] is a separate control that has to hold for the proxy to be meaningful." Without application allowlisting at the endpoint, an engineer can install a different AI tool. Without API key hygiene, an engineer can grab a key and pipe it into something that doesn't go through the proxy. Without identity-bound credentials, a leaked token works just as well outside the proxy as inside.

The proxy is one control in a chain. Treating it as the chain is bootstrap trust.

What end-to-end looks like

Each link in the chain has a control that survives the question Anthropic asks across the whitepaper: does this make the attack impossible, or just tedious?

Model integrity: verified weights, attested execution environment. Impossible to substitute, not just tedious.
Runtime: sandbox at the OS level, identity-isolated. Reaches across only the named services, not all the services the network would allow.
MCP servers: self-hosted on immutable infrastructure, signed releases, verified before each update. Not "we trust the upstream maintainer," but "we verify each release."
Credentials: per-agent, ephemeral, hardware-bound where reachable. An attacker who compromises the runtime finds no cached credentials to steal.
Scope: declared, narrow, time-bound. The credential carries less than the agent could theoretically need.
Audit: attribution chain from human prompt or scheduled trigger through to the API call. Three indistinguishable scenarios become distinguishable.
Resource: enforces its own access policy. The chain doesn't end at the API gateway; it ends at the database row.

None of these are exotic. Most are in the Anthropic whitepaper at the Foundation or Enterprise tier. What's exotic is having all of them present at once, on the same agent, with the chain documented and tested.

The minimum a small team can stand up

The Kane caveat is real. Most teams will not stand up a complete chain in one quarter. The pragmatic question is which links to harden first.

Three that close the most common breaks without requiring SPIFFE or a CA.

The first is MCP server provenance. Self-host any MCP server that touches production. Sign it. Verify it before each update. Treat each new server like a new third-party API: review before it goes live, not after.

The second is per-agent credentials. Stop using shared service tokens for agents. Each agent instance gets its own credential, scoped to what it needs, time-bounded. Use OAuth client credentials or STS-style brokering, whichever fits your stack. The credential never lives in a config file.

The third is identity-layer registration. Make every agent a first-class entity in the IdP. Owner, lifecycle, scope manifest, retirement date. The same machinery that handles human users handles agents. When the engineer who built the agent leaves, the agent's identity is part of the offboarding flow.

These three don't get you to the Advanced tier. They close the most common ways the chain breaks today.

What apparent governance looks like

The honest test for whether your trust chain holds is harder than reviewing a dashboard. Pick one action your agent performed last week. Try to reconstruct, in audit data you actually own, every link from the human prompt to the API call. The human's identity. The session intent. The agent's identity. The credential issued. The scope it carried. The action taken. The resource hit. The result.

If any link is opaque or assumed, the chain is broken there. The dashboard's green status doesn't fill it in. The vendor's claim doesn't fill it in. The proxy's log entry doesn't fill it in.

Apparent governance is the steady state for teams that don't run that test. End-to-end trust is what happens when the test is part of the operating cadence.

You can't bootstrap trust in your AI tools either

The chain that has to hold for an AI agent

Where the chain breaks in practice

The proxy pattern as bootstrap trust

What end-to-end looks like

The minimum a small team can stand up

What apparent governance looks like

Frequently asked questions

Related

AI agents are identities now

Anthropic's framework, the deployer gaps

SOC 2 CC6 evidence, without the spreadsheet