Claude Agent SDK and Managed Agents: Where to Run Production Agents

The conversation about AI agents has moved fast. A year ago it was about whether agents worked at all. Six months ago it was about how to build them well. Today, for teams that have shipped agents into production or are about to, the question has become a question of infrastructure: where does the agent actually run, and who is responsible when something goes wrong at three in the morning?
Anthropic now ships two answers to that question, and they sit at different points on the build-vs-buy spectrum. The Claude Agent SDK is a library (Python and TypeScript) that lets you run the same agent loop that powers Claude Code inside your own process, on your own infrastructure. Claude Managed Agents, launched in April 2026, is the opposite: a hosted REST API where Anthropic runs the harness, the sandbox, and the session log on its infrastructure, and your application sends events and receives results. Same underlying capability. Very different operational shape.
The decision between them isn't binary, and the right answer for most teams isn't either-or. Anthropic's own published guidance is to prototype with the Agent SDK and graduate to Managed Agents for production. This guide covers what each is for, the architectural insight that drove the Managed Agents design (decoupling the brain from the hands, in Anthropic's framing), how to pick between them for a given piece of work, and what the migration path actually looks like. The conclusion most production teams reach is that they need both.
The three abstractions: brain, hands, session
Before pulling the SDK and Managed Agents apart, it helps to understand the architecture both are built on, because Anthropic's framing is unusually clear about what's actually happening inside an agent. In an April 2026 engineering post, the team described the design philosophy behind Managed Agents as decoupling the brain from the hands, and the framing is worth borrowing whole. It applies just as well to the SDK.
Every agent, in this view, has three components that can live in different places:
  • The brain. Claude plus the harness (the loop that decides what to do next, routes tool calls, and feeds results back into the next turn). This is the part that reasons.
  • The hands. The sandboxes and tools that actually do things: read files, run code, call APIs, execute git commands. This is the part that acts.
  • The session. The append-only log of everything that happened: every event, every tool call, every result. This is the part that remembers.
The insight Anthropic landed on while building Managed Agents was that these three components should be independently swappable. The brain shouldn't assume where the hands are running. The hands shouldn't hold the credentials. The session shouldn't live inside the harness. When the three are decoupled, the whole system becomes far more recoverable: if the container running the hands dies, you spin up a new one and the brain catches the failure as a tool-call error. If the harness crashes, you wake a new one and it reads the session log to figure out where to pick up.
In the older, coupled design Anthropic started with, all three lived in one container. That meant when something went wrong, you couldn't tell which component had failed (the harness, the network, or the sandbox all looked the same from outside), and you couldn't replace one without disturbing the others. Anthropic's word for this is "pet": a hand-tended, named server that you can't afford to lose. The opposite is "cattle": interchangeable, replaceable, no individual member precious. Modern agent infrastructure wants the cattle pattern.
The Agent SDK and Managed Agents both use the same conceptual model. The difference is where you draw the operational boundary.
Brain, hands, session
From coupled "pet" to decoupled "cattle"
Anthropic's published reasoning behind Managed Agents. Step through both designs to see what changed and why.
1
Coupled (the old pattern)
All three components in one container
2
Decoupled (the current design)
Brain, hands, session as independent interfaces
The Agent SDK: agents in your process
The Claude Agent SDK is a Python and TypeScript library that gives you the same harness Claude Code runs on, exposed as a programmable interface. Your application calls query(), passes a prompt and a set of options, and gets back a stream of messages as Claude reasons, calls tools, and produces a result. The agent loop, the tool execution, and the context management are handled by the SDK. You don't implement them.
A minimal example in Python:
import asyncio from claude_agent_sdk import query, ClaudeAgentOptions async def main(): async for message in query( prompt="Find and fix the bug in auth.py", options=ClaudeAgentOptions( allowed_tools=["Read", "Edit", "Bash"], ), ): print(message) # Claude reads, finds, edits asyncio.run(main())
That code spins up the same agent loop that powers Claude Code, inside your Python process, with Claude reading auth.py from your local filesystem and applying fixes directly. The pattern in TypeScript is functionally identical.
What you get out of the box
The SDK ships with the same built-in tools as Claude Code: Read, Write, Edit, Bash, Glob, Grep, WebSearch, WebFetch, plus more specialized ones like Monitor (watch a background script and react to output) and AskUserQuestion (let Claude ask the user multiple-choice questions when it's uncertain). You don't have to wire any of these up; declaring them in allowed_tools is enough.
It also supports everything else Claude Code does, programmatically:
  • Skills. The same SKILL.md folders Claude Code uses, loaded from .claude/skills/ in your working directory or globally.
  • Subagents. Define specialized agents inline with AgentDefinition, set their permitted tools and system prompts, and let your main agent delegate.
  • MCP servers. Connect to external systems with a single config option. The agent can use any MCP-compatible tool you have running.
  • Hooks. Run custom code at lifecycle points (PreToolUse, PostToolUse, Stop, SessionStart, UserPromptSubmit). This is the SDK's audit and guardrail layer: log every file change, block dangerous operations, transform inputs, validate outputs.
  • Sessions. Capture a session ID, resume later, or fork to explore alternatives. Session state is stored as JSONL on your filesystem.
  • Permissions. Granular control over what each agent can do. Pre-approve safe tools, require user approval on sensitive ones, block specific operations entirely.
When the SDK is the right call
The SDK fits when the agent needs to work directly with resources you own and operate. Files in your codebase. Services in your VPC. Databases the agent needs network access to. Tools you've built in-house. If the agent's work lives on your infrastructure, running the agent on your infrastructure removes a whole class of networking, permissioning, and trust-boundary problems.
It also fits for local development and prototyping. Anthropic's own recommendation is to prototype agents with the SDK first, because the iteration loop is tight (write code, run it, see what happens) and you have full visibility into every message. CI/CD pipelines are another natural fit: the agent runs inside the same process as the rest of your pipeline, with the same observability, the same logging, the same secrets management.
What you take on
Running an agent in your own process means you operate the infrastructure. The session log lives on your filesystem. The sandbox isolation, if you need it, is your responsibility. If a long-running agent takes hours and your process restarts, recovery is up to you. None of this is hard, but it's work that Managed Agents would handle on your behalf, and it adds up if you're operating many agents at production scale.
There's also a billing note worth knowing. Starting June 15, 2026, Agent SDK and claude -p usage on subscription plans draws from a new monthly Agent SDK credit that's separate from your interactive usage limits. If you're building production agents on the SDK, check how the credit math works for your expected volume.
Managed Agents: agents on Anthropic's infrastructure
Managed Agents is what you get when you take the decoupled brain/hands/session architecture and host it as a service. Anthropic launched it on April 8, 2026 as a hosted REST API in the Claude Platform. Your application sends events into a session, the harness runs on Anthropic's infrastructure, the sandbox spins up on Anthropic's infrastructure, the session log is durably stored on Anthropic's infrastructure, and your application streams events back as the work progresses.
The framing Anthropic uses for the design is that Managed Agents is a meta-harness: unopinionated about the specific harness that runs inside it, but opinionated about the interfaces around the harness. The reasoning, from the engineering post: "we can't predict what specific context engineering will be required in future models." The interfaces (session, sandbox, harness) are designed to outlast any specific implementation.
What the surface area looks like
From the application side, Managed Agents is a small set of operations that map onto the three abstractions:
  • Session: getSession(id), getEvents(), emitEvent(id, event). The session log is durable and queryable. The brain can rewind, slice, or replay events as needed.
  • Sandbox: provision({resources}), execute(name, input). Containers are provisioned only when the brain decides it needs hands. If a container dies, the harness catches the failure and retries.
  • Harness: wake(sessionId). The harness itself is stateless. When it crashes, a fresh one wakes up, reads the session log, and resumes from the last event.
Custom tools work through MCP servers, with OAuth tokens stored in a secure vault outside the sandbox. Claude calls MCP tools through a dedicated proxy that fetches credentials from the vault and makes the external call. The harness never holds the credentials; the sandbox never has access to them. This is the security boundary Anthropic specifically engineered for: a prompt injection inside the sandbox can't reach the tokens that would let it spawn unrestricted sessions.
What you get out of it
A few things you stop having to think about:
  • Recovery and restarts. Sessions survive harness crashes, container failures, and network blips. The durable event log means nothing in flight is lost when something restarts.
  • Sandbox operations. Container provisioning, lifecycle, isolation, networking, all handled. You don't run Docker, you don't manage Kubernetes, you don't think about how containers are scoped per session.
  • Session storage. The event log is hosted. You can query it through the API rather than running your own event store.
  • Security boundaries. The auth-token-in-vault pattern is built in. You don't have to design the credential isolation yourself.
  • Scaling characteristics. p50 time-to-first-token dropped roughly 60% versus the older coupled architecture, p95 dropped over 90%, per Anthropic's own measurements. The hosted architecture is optimized for many concurrent agents.
What you give up
Hosted means hosted. The session log lives on Anthropic's infrastructure. The sandbox runs there. For some teams and some kinds of work, this is fine (most teams already trust Anthropic with their conversations). For others, particularly in regulated industries or organizations with strict data-residency requirements, it's a serious consideration. Managed Agents supports connecting to resources in your own VPC, but the harness still runs on Anthropic's side.
The other thing you give up is some of the immediacy of in-process execution. With the SDK, the agent reads a file by making a direct system call. With Managed Agents, the harness calls into a sandbox over a service boundary. For most work this is invisible; for latency-sensitive agent loops with rapid tool calls it might matter.
What you actually write
SDK vs Managed Agents: the developer surface
Same agent capability, two operational shapes. Pick a task to see what writing it looks like with each.
Find and fix a bug
Resume a session
Long-running task
</>
In your process
Agent SDK (Python)
What's happening
    On Anthropic's infra
    Managed Agents (REST)
    What's happening
      Comparing the two, on the dimensions that matter
      The two products solve the same underlying problem (run a Claude agent loop that does work) at different operational layers. Practitioners moving between them usually decide on five dimensions: where the work lives, who operates the infrastructure, how state is managed, what happens when something fails, and what you can do about custom tools.
      A few key distinctions worth holding in mind:
      • Custom tools work differently. In the SDK, custom tools are in-process Python or TypeScript functions; you wire them up directly. In Managed Agents, custom tools work through MCP servers, with Claude triggering the tool and your service returning results.
      • Failure recovery is different. The SDK gives you a single process; recovery is your design. Managed Agents gives you durable session logs and stateless harnesses by design.
      • Latency profiles differ. SDK: direct syscalls for file edits, near-zero overhead inside the process. Managed Agents: every tool call crosses a service boundary, but inference can start before any container is provisioned, which often nets out faster for cold-start agents.
      • Authentication shape differs. The SDK uses your API key (or third-party providers like Bedrock, Vertex, Foundry); credentials live in your process. Managed Agents uses session-scoped tokens and an external vault; credentials never reach the sandbox.
      The visual below makes the comparison explicit across the dimensions that matter for the practical "which one do I reach for" decision.
      SDK vs Managed Agents
      Two ways to run the same agent loop
      Pick a product to see its summary, then switch dimensions to compare both at once.
      Option A
      Agent SDK
      In your process, on your infrastructure
      Python and TypeScript libraries that run the same agent loop as Claude Code in your own application, against files and services you control.
      Option B
      Managed Agents
      Anthropic's hosted REST API
      A meta-harness: Anthropic operates the brain, hands, and session log. Your application sends events and streams results back.
      Where it runs
      Interface
      Session state
      Custom tools
      Best fit
      Which should you reach for?
      The right answer for most teams shipping production agents in 2026 is to use both, at different stages and for different purposes. Anthropic's own published guidance puts it this way: "a common path is to prototype with the Agent SDK locally, then move to Managed Agents for production." That recommendation is reasonable for most teams, and it deserves to be the default rather than a fallback.
      The case for starting with the SDK
      The SDK has the tighter iteration loop. You can write an agent, run it against your local files, watch every tool call happen, and rewrite it ten times in an afternoon. The session log lives on your filesystem where you can read it. The hooks let you inspect every step. Everything is in your process, which means everything is debuggable with the same tools you use for the rest of your code.
      For learning what your agent should actually do, this is hard to beat. The questions that matter early ("what tools does it need?", "what permissions are appropriate?", "where does it get confused?") are answered fastest by direct experimentation. Once you know the shape of the agent, moving it to production is comparatively mechanical. Building the agent in production is much harder than moving a built agent to production.
      The case for graduating to Managed Agents
      Managed Agents is the right answer when the operational characteristics of the agent become more important than the development loop. A few patterns where it earns its keep:
      • Long-running asynchronous work. Agents that run for hours, or that need to survive your application restarting. The durable session log and stateless harness handle this by design.
      • Many concurrent agents. Multi-tenant applications where each user has their own agent session. Operating sandbox infrastructure for hundreds of sessions is a job; Managed Agents removes it.
      • Security-sensitive integration. Agents that need to call external services with OAuth tokens. The vault pattern keeps credentials out of the sandbox, which is hard to get right yourself.
      • Operations you don't want to own. Container provisioning, lifecycle management, isolation between sessions, crash recovery, network configuration. All of these are real work that's worth handing off.
      When to stay on the SDK
      There are also reasons not to graduate, and they're worth naming. If your agent lives entirely inside your own infrastructure, talks only to your own services, and has data-residency requirements that prevent hosting on Anthropic's platform, the SDK is the right answer. If your agent is part of a CI/CD pipeline where the process lifecycle is already managed by your CI system, hosting it elsewhere adds complexity without much benefit. If you've already built sandbox and session infrastructure for other reasons and the agent slots into it naturally, you may not need Managed Agents at all.
      The decision helper below walks through the most common combinations.
      SDK or Managed Agents?
      Three questions, one path
      Most teams end up using both at different stages. This walks through which one fits your specific situation right now.
      Question 1 of 3
      What stage is the agent in?
      Prototyping — still learning what it should do
      Production — behavior is well understood, now operational
      Question 2 of 3
      What does the runtime look like?
      Short-lived — finishes inside a single process invocation
      Long-running, async, or many concurrent sessions
      Question 3 of 3
      Where does the data live?
      On your infrastructure, can't leave (residency, regulation, VPC)
      Flexible — Anthropic-hosted is acceptable
      Answer all three questions above
      Your recommendation will appear here
      The right tool depends on where the agent is in its lifecycle, how long it runs, and where the work needs to happen.
      Why infrastructure choices are methodology choices
      The framing Anthropic uses for Managed Agents has an interesting property: the brain/hands/session decoupling isn't really an infrastructure pattern. It's a methodology pattern, dressed up as infrastructure. The discipline of separating what reasons from what acts from what remembers is what makes the system recoverable, auditable, and replaceable in pieces. Without that separation, you have a "pet": named, hand-tended, fragile. With it, you have something that scales.
      The same pattern shows up in Generative-Driven Development at the methodology layer rather than the platform layer. A GenDD Pod separates the same three components, just for human-AI teams instead of single agents:
      • The reasoning layer. The plans, the prompts, the methodology itself. This is the brain in HatchWorks vocabulary: it can change as understanding improves.
      • The execution layer. The Context Packs, Skills, agents, MCP tools, and connected services that do the work. These are the hands: replaceable, scoped, permission-limited.
      • The audit layer. The decision log, the status files, the record of what happened. This is the session: durable, queryable, useful for understanding what was done and why.
      The reason the same architectural separation shows up at both layers is that the underlying problem is the same. Anytime you have intelligent work happening across multiple components, you want those components to be independently replaceable. You want the failure of one to not bring down the others. You want the durable record to survive the death of the process that produced it. These are good design properties regardless of whether you're building a multi-agent system or running a software development practice.
      For teams thinking about how to ship production agents, the practical implication is that infrastructure choices and methodology choices are coupled. Choosing the Agent SDK without the methodology discipline produces agents that work in development and break in production. Choosing Managed Agents without the methodology discipline produces a hosted service running fragile work, with the hosting making the fragility harder to debug. Both products are designed to support good methodology; neither one substitutes for it.
      The question for any team deploying agents at scale isn't just SDK or Managed Agents. It's whether the team has the methodology layer that makes either of them work. That layer is what we've been building at HatchWorks for the last two years, and it's the layer that turns "we have agents in production" into "we have agents in production that we trust."
      The brain/hands/session decoupling isn't really an infrastructure pattern. It's a methodology pattern, dressed up as infrastructure.
      HatchWorks AI
      Build production agents with the methodology layer that makes either choice work
      HatchWorks AI helps engineering organizations design, build, and operate Claude-powered agents in production, whether you're prototyping with the Agent SDK, running on Managed Agents, or doing both at different stages. Generative-Driven Development is the methodology layer; agents are one of the artifacts it produces. If you're trying to figure out where to run your agents, how to govern them, or how to ship them safely at scale, we can help.