Most automation tells a system exactly what to do: run these steps, in this order, every time. Goal automation is different. You hand an agent an objective, not a script, and it decides the steps for itself, adapting as it learns what the situation actually requires. This is the capability the rest of this series has been building toward, and it is also the one that demands the most discipline, because an agent that chooses its own path is only as safe as the boundaries you put around it.
Here we ask the real question: how do you give an agent a goal and trust the result.
What this guide covers
What the Agent SDK is
The Claude Agent SDK is Anthropic's library for building agents in Python and TypeScript. It exposes the same engine that runs Claude Code, so an agent you build with it can read files, run commands, search the web, and edit code out of the box, with the tool loop, context management, and compaction handled for you. The practical headline is that a working agent is a function call plus an options object, around ten lines, because everything underneath is already built. The library itself is free; you pay for the model tokens it uses, or draw on your plan's Agent SDK allowance.
One history note saves a lot of confusion. The SDK launched as the Claude Code SDK and was renamed the Claude Agent SDK in late 2025, and the options type changed at the same time. Older tutorials that import the previous package or options class will not run against current versions, so verify any example against the current documentation before copying its imports. With that out of the way, the rest of this guide is about the surfaces you actually configure.
Your agent, in a few lines
Every interaction starts with the query function. You pass it a prompt and an options object, and it returns a stream of messages as the agent works: a system message that carries the session identity, assistant messages as it reasons and calls tools, and a final result message with the output and usage. The loop you would otherwise build by hand is entirely inside that call.
The behavior of that agent is set entirely by the options. Change the allowed tools and you change what it can touch; change the permission mode and you change how freely it acts. Build an options object below and watch the configuration and the resulting behavior update.
Allowed tools
Permission mode
Max turns
External tools
Notice how much of an agent's safety and scope is just configuration. A read-only allowlist with plan mode is an analyst that cannot change anything; add Edit and switch to acceptEdits and the same skeleton becomes a code-fixing agent. The skill is in choosing the least privilege that still does the job.
Three layers of tools
An agent is only as capable as its tools, and the SDK gives you three layers to work with. The built-in tools ship with the SDK and need no implementation: reading, writing, and editing files, running shell commands, and searching the filesystem and the web. Custom tools are your own functions, defined in process and offered to the agent through a lightweight in-process server, which is how you let it record a structured finding, write to your data layer, or call your own logic without standing up a separate service. External MCP servers connect the agent to outside systems and live data through one standard interface, the same Model Context Protocol used everywhere else in the stack.
One point of precision saves real debugging time. Listing a tool in the allowed set pre-approves it so it runs without a prompt; it does not control which tools exist. To remove a capability you disallow it explicitly. That distinction is the seam between availability and permission, and it is exactly where the control layer picks up. The procedural knowledge for using these tools well belongs in a Skill, as covered in Claude Skills architecture, which the SDK can load the same way Claude Code does.
Permissions, control, and verification
Because you are building the harness rather than driving it, control becomes programmatic. The first layer is the permission mode, which sets the default posture from read-only planning to fully unattended. The second is the allow and deny lists, which pre-approve safe tools and refuse dangerous ones by name. The third, and the one unique to building your own harness, is the permission callback: a function that runs for any tool call not already settled by the lists, and returns a decision in code. It can allow the call, deny it with a message, or allow it with a modified input, which lets you narrow what a tool is permitted to do rather than only whether it runs.
Try a few tool calls against a callback that allows reads, blocks anything touching secrets, and refuses destructive commands as defense in depth.
A tool call arrives. What does the callback return?
The deepest control is the hook: your own code wired to fixed points in the lifecycle, such as before or after every tool call. A pre-tool hook runs regardless of what the model decided and can block an action outright, which makes it the most reliable guardrail in the system because it is deterministic code rather than model judgment. Permission modes set intent, the callback decides the gray areas, and hooks enforce the rules that must always hold. Together they are the control responsibility from the pillar, now expressed as software you own.
Context and sessions
By default each query starts fresh with no memory of the last one. To carry a conversation forward you either continue the most recent session or resume a specific one by its identifier, which the system message hands you at the start of every run. That session identity is what lets a multi-step agent pick up where it left off, recover after a failure, and keep an auditable record of what it did. For genuinely interactive, in-process work, the SDK also offers a client object that holds a conversation open across turns rather than starting a new query each time.
Underneath, context is managed for you exactly as it is in Claude Code: the window accumulates as the agent works, and the SDK compacts it automatically as it fills. One configuration detail trips up many teams. Loading your project's CLAUDE.md is not automatic from the SDK; it requires both opting into project settings and using the matching preset system prompt. Miss either and the agent simply will not see your conventions, which is the single most common reason an SDK agent ignores rules a Claude Code session would have followed.
HatchWorks AI is an Official Anthropic Claude Partner. Our Anthropic-certified Forward Deployed Engineers deploy Claude into your business and make it stick.
See how our FDEs work →SDK or Claude Code
They are the same engine, so the choice is about where the agent needs to live, not which is more capable. Claude Code is the right tool when a person is in the loop at a terminal. The SDK is the right tool when the agent has to run somewhere a terminal is not: inside your product, behind your own interface, or in an automated pipeline.
| Question | Claude Code | Agent SDK |
|---|---|---|
| Who operates it | A person at a terminal | Your code |
| Where it runs | Terminal, desktop, editor | Your app, service, or pipeline |
| Best for | Hands-on, interactive work | Embedded and unattended agents |
| Control surface | Modes, rules, hooks, config files | The same, expressed in code |
Because the engine is shared, work does not get thrown away when you move between them. The Skills, the MCP tools, and the hook logic you build for one carry to the other, so prototyping a workflow interactively in Claude Code and then productizing it through the SDK is a natural progression rather than a rewrite.
From prototype to production
A demo agent and a production agent differ mostly in their guardrails, and the SDK gives you the ones that matter. Two options should never be omitted from an unattended run: a turn limit, so a confused agent fails fast instead of looping forever, and an explicit permission posture, so nothing dangerous runs by default. A spend cap adds a hard ceiling on cost for agents that could otherwise make many calls. The discipline is least privilege: start an agent read-only, then grant the narrowest set of tools and the loosest mode that still lets it finish the job.
Cap it
Turns and budget
Set a turn limit and a spend ceiling on every unattended run.
Least privilege
Narrow tools
Grant the smallest allowlist and tightest mode that still works.
Enforce
Hooks and deny rules
Put must-never-happen rules in code, not in instructions.
Host it
Deploy or manage
Run it in your own infrastructure or on managed agents.
When you are ready to run agents at scale, you can host the SDK in your own environment, authenticating against the API directly or through a cloud provider, or move to managed agents for hosted sandboxes and sessions you would rather not operate yourself. Our guide to the Claude Agent SDK and managed agents goes deeper on the production and hosting path.
Common pitfalls
The SDK is straightforward to start with and easy to misconfigure in ways that only bite in production. These are the recurring ones.
The SDK was renamed, and guides written before the change import a package and options class that no longer exist. If example code fails on its imports, that is almost always why. Check against the current documentation.
Without a turn cap a confused agent can loop until it exhausts its context or your budget. Set a turn limit and a spend ceiling on every run that is not being watched by a human.
The allowed list pre-approves tools so they skip the prompt; it does not define which tools exist. To actually remove a capability you must disallow it. Assuming a short allowlist is a sandbox is a real security gap.
From the SDK, project instructions are not read automatically. You have to opt into project settings and use the matching preset system prompt. Miss either and the agent ignores conventions you assumed it had.
A rule in the system prompt is followed most of the time, not always. Anything that must never happen belongs in a deny rule or a hook, where it is enforced by code rather than by the model's good behavior.
From goal automation to a methodology
Everything in this series points to the same conclusion. The model is remarkable, the harness is capable, and the tools are powerful, but none of that decides whether autonomous work can be trusted. What decides it is discipline: choosing the right goals, specifying them with a success check and clear boundaries, building verification in layers, and drawing the human and AI line before the agent runs rather than after it goes wrong. Capability is necessary. It is not sufficient.
That discipline is what a methodology is for, and it is why every article in this cluster has ended in the same place. At HatchWorks, Generative-Driven Development turns these practices into a repeatable system: Skills codify method, the harness runs it, the GenDD Execution Loop wraps the agent loop in planning and verification, and the Three-Tier Human and AI Boundary Model keeps people in control of the decisions that matter. Goal automation is the payoff the whole series was building toward, and a methodology is what turns it from an impressive demo into work your organization can depend on. That is the work we do with engineering teams every day.
You've seen how to build your own agent with the SDK. Our FDEs make the production calls, control, verification, and deployment, so what you build is dependable, not just a demo.
Official Anthropic Claude Partner
Part of the Claude Partner Network, HatchWorks AI embeds Anthropic-certified Forward Deployed Engineers in your team to find where Claude delivers value, ship it into production, and help make adoption stick.
Talk to a Forward Deployed Engineer See how FDEs work