Threat modeling Claude Code in production

A practitioner threat model for a Claude Code agent running against production AWS. STRIDE walkthrough. What native primitives catch (hooks, sandboxing, deny-by-default). Where the gaps are. What intent-bound sessions add.

8 min read · Last updated June 2026

The setup. Engineering at a 300-person fintech ships a Claude Code workflow that runs against production AWS. The agent has STS-issued credentials to read CloudWatch logs, read X-Ray traces, and write to a specific S3 bucket for diagnostic artifacts. The session is started by an engineer through a Slack bot that authenticates with their corporate SSO. The intent declared at session start is "investigate customer-reported latency in the payments service."

This post walks STRIDE through that scenario. For each category: the attack, the controls that catch it (native and added), the residual risk, and the action item.

The point isn't to prove that Claude Code is unsafe or that AWS is unsafe. Both have strong primitives. The point is to see what the strong primitives catch, what they don't, and where intent-bound runtime evaluation adds something the native layer can't.

Spoofing

The attack. An attacker who has compromised an engineer's laptop attempts to start a session impersonating the engineer. Or a prompt injection inside log data the agent retrieves causes the agent to act as if a different user instructed it.

What catches it. SSO with phishing-resistant MFA on the human side (FIDO2 or passkey; SMS doesn't meet Anthropic's Foundation bar). Per-agent identity from the IdP, hardware-bound where possible. CloudTrail entries that distinguish the agent's STS session from any other session. Intent-bound sessions add the second pass: even if the agent has the right credential, the declared session intent has to align with the action being taken.

Residual risk. A prompt injection that successfully recasts the user's intent. The credential checks all pass; the agent acts on instructions that came from data, not from the user. Runtime evaluation against the original declared intent is the control that catches this. Native Claude Code primitives don't ship with intent semantics by default.

Action. Bind sessions to declared intent. Block actions that don't align with intent even when the credential authorizes them.

Tampering

The attack. An attacker modifies the agent's configuration, the MCP server's tool definitions, the prompt template, or the audit logs after the fact. Or a man-in-the-middle on the agent's outbound calls swaps the request payload.

What catches it. Claude Code's ConfigChange hook fires on settings changes, allowing organizations to audit or block. MCP servers signed and self-hosted on immutable infrastructure. Mutual TLS with certificate pinning on agent-to-service connections (Anthropic Enterprise tier). CloudTrail with log-file validation enabled. S3 buckets with object lock for audit storage.

Residual risk. An attacker with admin on the host where Claude Code runs can modify what runs before the ConfigChange hook fires. Hardware-rooted attestation (TPM-backed identity, confidential computing) is the control at the Advanced tier; most teams won't have it yet.

Action. ConfigChange hooks on settings.json. Sign and verify MCP servers. Object-locked audit log storage. Treat the agent host as a sensitive endpoint with its own posture requirements.

Repudiation

The attack. After an incident, the chain of custody can't be reconstructed. The agent did something. Nobody can say who triggered it, what intent was declared, what the agent attempted vs. what completed, and whether any actions were blocked along the way. The three indistinguishable scenarios Kane Narraway named (the attribution gap) hold: user directly, user-instructed agent, or hallucinating agent. All look identical in the log.

What catches it. CloudTrail with request IDs propagated through every downstream call. Claude Code's session.id, user.account_uuid, and organization.id attribution on all telemetry events. Distributed tracing across the agent's tool calls (Anthropic Enterprise tier; Claude Code supports OpenTelemetry). Intent declared at session start, recorded in the audit log, propagated through to every action.

Residual risk. Actions that complete before evaluation catches up. The log shows the action and the block (or the absence of a block) but not necessarily the latency between the two. Dwell time matters here.

Action. Capture session intent in the audit pipeline. Propagate request IDs from the human prompt through to the API call. Ship full provenance chains if the agent is in a regulated workload (HIPAA, FINRA, the EU AI Act in some jurisdictions).

Information disclosure

The attack. The agent reads more data than the task required. Or it returns sensitive content in its response that gets logged downstream. Or it exfiltrates data through a side channel: a tool call that posts to an external endpoint, an "uploaded image" that becomes a public Imgur link, a "summary" that ends up in a third-party MCP server's context.

What catches it. STS session policies that scope the credential to the minimum (specific buckets, specific log groups, time-bounded). Tool allow-listing at the agent: capability restrictions on what an "email" tool can do, what a "file write" tool can do. Output filtering for PII patterns. Network request approval for outbound calls.

Residual risk. Approved tools used in unintended sequences (the tool chaining attack Anthropic names: secure internal tool + external email tool = exfiltration neither would expose alone). The MCP server's context store is owned by whoever runs the MCP server; your retention policy may not extend there.

Action. Allowlist tools explicitly. Validate parameters with PreToolUse hooks. Constrain network egress at the runtime. Inventory which MCP servers retain context outside your control and either retire them or self-host.

Denial of service

The attack. An agent caught in a loop hits a billable API thousands of times. A resource exhaustion attack from prompt injection: a malicious comment in a Jira ticket the agent reads instructs it to invoke a costly action repeatedly. The bill arrives at the end of the month.

What catches it. Rate limits at the API. Circuit breakers in the runtime. Spending caps at the AWS account level. Anthropic's note on this is sharp: rate limits are friction, not barriers, and a determined attacker grinds through them. The hard control is the circuit breaker that halts.

Residual risk. The cost of the attack is bounded by the circuit breaker, but the time to detect and stop is the dwell time. For a $0.001-per-call API hit in a tight loop, the bill can run six figures inside an hour.

Action. Set hard limits at the AWS account level (Service Quotas, Budgets with hard actions). Pair with PreToolUse hooks that count and halt. Page on the threshold, don't just throttle.

Elevation of privilege

The attack. The agent attempts an action outside its declared scope. A debug session tries to write where it was supposed to read. A sub-agent receives a "valid-looking" instruction from a peer agent and executes it without checking original intent (the confused deputy Anthropic describes). A retired agent's credential is still active and gets used.

What catches it. STS session policies bound to the task. IAM conditions on the credential (time-of-day, source IP, MFA-required for sensitive actions). Per-agent identity with no shared credentials. Sub-agents that inherit a constrained slice of the parent's intent, not the full envelope. Intent-bound runtime evaluation: actions outside the declared session goal are blocked even when the credential authorizes them.

Residual risk. The agent attempts an action that's borderline. The runtime has to make a judgment call: block, allow, or route to a human. Get this wrong in either direction and you have either friction or a breach.

Action. Per-agent credentials, never shared. Sub-agent identity with its own permissions, not just inherited from the parent. Intent-bound evaluation that distinguishes "agent has permission" from "agent should be doing this right now."

What the native layer catches vs. what's left

Claude Code and AWS together close most of the controls. Settings.json with deny-by-default, PreToolUse hooks for parameter validation, ConfigChange hooks for tamper detection, sandboxed execution at the OS level, session isolation, OpenTelemetry attribution. STS for scoped credentials, IAM conditions for context-aware authorization, CloudTrail for audit, Service Quotas for hard limits.

What the native layer doesn't catch:

  • The semantic check that this action aligns with the session's declared intent.
  • The chain of custody from human prompt to API call across all the layers (the audit log captures the AWS call; the prompt that triggered it is somewhere else).
  • The case where the credential is valid, the action is permitted, but the intent has drifted.
  • The retired agent whose credentials are still live because the offboarding flow didn't include the agent identity.

Each of these is where intent-bound governance sits. The runtime layer above the credential and below the API gateway. The Claude Code agent has tool X; the session intent is Y; the action is evaluated against Y; out-of-scope actions are blocked or routed for approval regardless of whether tool X would otherwise permit them.

What to do this week

Pick the agent in your environment with the largest blast radius. Walk it through these six categories. For each, write the attack, the control that catches it, and the residual risk. The exercise takes an afternoon. The output is a one-page artifact that lets you talk to the CISO about agent risk in a structure they recognize.

Most teams will find two or three categories where the native primitives close the attack cleanly, two or three where they catch most of it but leave a clear residual, and one where the gap is wide enough to warrant a separate project. The intent-bound layer is usually where the wide gap is.

Frequently asked questions

What is STRIDE in the context of AI agents?

The Microsoft threat model: Spoofing, Tampering, Repudiation, Information disclosure, Denial of service, Elevation of privilege. Originally for software, it ports cleanly to agentic systems because each category surfaces a specific failure mode in agent identity, sessions, and tool access. The value of running STRIDE on a Claude Code deployment isn't novelty; it's that the structure forces you to name the attack and the control side by side.

Why pick Claude Code + AWS specifically for the walkthrough?

Two reasons. Claude Code has the deepest native security primitives of any agent runtime today: settings.json permission policies, hooks for PreToolUse and ConfigChange, sandboxed execution, OAuth 2.0 with auto-refresh, session isolation, OpenTelemetry. AWS has the most sophisticated fine-grained controls of any cloud platform: STS session policies, IAM conditions, CloudTrail, service control policies. Together they make the security work tractable. Most agent + production deployments don't have this substrate; reasoning about them with it lets us see what's possible at the ceiling, then ask why it's not the default.

Where does prompt injection fit in STRIDE?

Most clearly under Spoofing and Tampering, with overlap into Elevation. The injected instructions spoof the user's intent; the action chain that follows tampers with the data flow downstream; if the injection succeeds in getting the agent to escalate scope, it's also Elevation. Anthropic's framework treats input isolation and constitutional classifiers as the boundary defense; the runtime question of whether the agent's actions still align with declared intent is the second line.

What's the residual risk after running these controls?

Three categories. (1) Boundary defenses miss novel prompt injection techniques; the action chain runs before runtime evaluation flags it. (2) Third-party APIs you call are sandboxed by the provider, not by you; what the agent does inside Salesforce or Workday is outside your runtime. (3) Memory and context belonging to managed services you don't operate (vector stores, MCP servers run by vendors) sit outside your retention policy. The residual risk shrinks as you constrain scope, narrow declared intent, and shorten session windows, but it doesn't go to zero.

What does 'intent-bound session' add over Claude Code's native permissions?

Claude Code permissions are per-tool: the agent has tool X with deny-by-default or ask-for-approval policy. Intent-bound sessions add a per-session constraint: this session is for purpose Y, so even tools the agent has permission to use can be blocked if the use case doesn't align with Y. The native primitives are static; intent binding adds the runtime semantic check. Both work in combination; neither replaces the other.

How do you handle multi-agent and sub-agent scenarios?

Each agent and sub-agent needs its own identity and credentials, not shared. Anthropic's whitepaper is explicit: 'each agent should have a unique ID and its own access credentials. If you break it into multiple agents and provide them all the same credentials, you have failed to compartmentalize the risk.' Sub-agents inherit a constrained slice of the parent's intent, not the full envelope. The confused deputy attack assumes shared identity or shared trust between agents; per-agent identity and per-session intent break it.

What about the rate-limiting / circuit-breaker question?

Useful but friction-only, per Anthropic's 'impossible vs tedious' test. An agent that loops on a costly API hitting your billing limits is a real failure mode (resource exhaustion attack), and a circuit breaker that halts at $1,000 / hour is a real control. It just isn't a security control against a determined adversary. Pair it with hard limits (deny once threshold is hit, page the on-call) rather than soft limits (throttle and continue). Most teams ship the soft version because it's less disruptive to the happy path.

What does CloudTrail give you and what does it miss?

CloudTrail captures every AWS API call: who, what, when, from where, with what credentials. That gives you the audit log primitive at the AWS layer. What it misses: the chain of custody from the human prompt or the session intent down to the API call. You see 'STS session X invoked S3.GetObject on bucket Y at 2026-06-10T14:32:11Z' but not 'this S3 call was triggered by Claude Code in a session whose declared intent was investigate customer complaint #4421.' Reconstruct that link in your own logging layer or the audit trail stops short of the actual answer.

What goes into the runbook for an active prompt injection incident?

Five things in this order. (1) Terminate the active session. (2) Revoke the credentials it held. (3) Pull the full action chain from CloudTrail and from your runtime logs; reconstruct the sequence of tool calls. (4) Identify which actions completed and which were blocked. (5) For completed actions, determine reversibility (was it a read, a write, an irreversible state change). The decision tree for containment depends on what fired before runtime evaluation caught up.

Is this overkill for a 50-person company?

Mostly. A 50-person company with one or two Claude Code users on development workloads doesn't need a full STRIDE pass. The exercise becomes valuable when (a) the agent runs against production, (b) the production system is regulated or business-critical, or (c) multiple agents or sub-agents coordinate. The threshold is the blast radius, not the headcount. A small team with a powerful agent on production AWS is the audience for this walkthrough.