Claude Skills Architecture: How to Design Skills That Scale

Andy Smith
June 18, 2026

Updated: July 1, 2026

Most teams meet Claude Skills as a single folder that teaches Claude how to do one thing well. That framing is correct, and it is also where most implementations stop. The harder and more valuable question is architectural: how do you design a system of Skills that stays lean as it grows, routes reliably to the right capability, and travels across every surface your team builds on. This guide treats Skills as an architecture rather than a feature, and walks through the design decisions that separate a handful of one-off folders from a durable capability layer.

If you are new to the format itself, our primer on Claude Skills covers the basics of what a Skill is and how to create your first one. This article assumes you are past that point and ready to think about composition, context economics, and the boundaries between Skills and the rest of the Claude stack.

What this guide covers

Skills are the method layer
The three levels of progressive disclosure
Level 1: the description is your router
Level 2: keep the body lean
Level 3: documents to read, code to run
The decision boundary: Skill, prompt, Project, MCP, or subagent
Composition: the layers are meant to stack
Common architecture mistakes
From a system of Skills to a methodology

Skills are the method layer, and that changes how you design them

A Skill is a directory containing a SKILL.md file, optional scripts, and optional reference material that Claude discovers and loads when a task calls for it. Anthropic describes Skills as modular capabilities that package instructions, metadata, and resources Claude uses automatically when relevant. The architectural point hiding inside that definition is that Skills occupy a specific layer in the stack. Projects hold the background knowledge about a body of work. The Model Context Protocol connects Claude to live systems and data. Prompts carry the immediate, conversational instruction for one task. Skills sit between them as the method layer: the codified, reusable knowledge of how a task is done correctly, every time.

Designing at the method layer means optimizing for three properties that one-off folders rarely consider. The first is discovery, because a Skill that Claude does not reliably select is dead weight. The second is context economy, because a capability layer that loads everything it knows on every task will exhaust the very resource it was meant to protect. The third is portability, because the value of codified method compounds only when the same Skill works across Claude.ai, Claude Code, and the Agent SDK without modification. Progressive disclosure is the mechanism that delivers all three, so it is where any serious architecture starts.

The three levels of progressive disclosure

Progressive disclosure is the core design principle of Skills, and it is best understood as a three-level loading model. Claude reveals information in stages as a task requires it, rather than consuming context up front. Each level has a different trigger and a different cost.

Level 1
Metadata

YAML frontmatter (name and description)

Loaded into the system prompt at startup for every installed Skill. This is how Claude knows a Skill exists and when it might apply, without reading any of its instructions.

Always loaded. A few tokens per Skill.

Level 2
Instructions

The SKILL.md body

The full procedural knowledge: workflows, steps, and guidance. Claude reads this only after the description matches the task at hand.

Loaded on match. Anthropic suggests keeping it under 500 lines.

Level 3
Resources

Bundled files and scripts

Reference documents the body points to, plus scripts Claude can execute rather than read. A linked reference is loaded only when its path is needed. A script can run without ever entering the context window.

Loaded or executed on demand. Zero cost until accessed.

The clearest way to feel why this matters is to watch the context window as a single task moves through the levels. Step through the sequence below.

Watch the context window fill Step 1 of 5

Context used by the pdf-processing Skill

Striped bar: what one Skill would cost if everything loaded up front.

Back

Next step

Reset

This model is what lets a team install many Skills without paying a context penalty for each one. At startup, Claude carries only the lightweight metadata. The expensive material stays on the filesystem until a task actually reaches for it. Every architectural decision that follows is really a decision about which level a given piece of knowledge belongs in.

Level 1: the description is your router, so design it like one

The single highest leverage design decision in a Skill is the description in its frontmatter. Claude decides whether to engage a Skill almost entirely from this field, so it functions as a router rather than a label. A vague description quietly fails by never triggering, which is the most expensive failure mode because it produces no error, just an unused capability and an agent that falls back on improvisation.

--- minimal frontmatter that routes well --- name: invoice-reconciliation description: Reconciles vendor invoices against purchase orders and flags mismatches. Use when the user mentions invoices, reconciliation, accounts payable, PO matching, or uploads billing documents.

A strong description does two jobs at once. It states what the Skill does, and it names the conditions under which Claude should reach for it, including the words a user is likely to use. Anthropic's own guidance for Skill authoring recommends writing the description in the third person and being explicit about both the capability and its triggers. The discipline here mirrors good API design: the interface has to make the right call obvious from the outside. The description field has a hard ceiling of 1,024 characters, which is generous enough for precise triggering and tight enough to force clarity.

Architecture test for any description: read only the name and description, with no access to the body, and decide whether you could correctly route ten realistic requests to the right Skill. If you cannot, neither can Claude. Routing accuracy is set at Level 1, not in the instructions.

Level 2: keep the body lean and load the rest

The SKILL.md body holds the procedural core: the steps, the workflow, the rules that should hold every time. The design constraint is length. Anthropic recommends keeping the body under 500 lines, and its own Skill development guidance targets roughly 1,500 to 2,000 words for the main file. The reason is not stylistic. The body loads in full whenever the Skill triggers, so every paragraph you keep in it is context you spend on every invocation, whether or not that particular task needs it.

The architectural move is to treat the body as an index that points outward. Core instructions stay inline. Anything that is detailed, occasional, or mutually exclusive with another path moves into a separate file that the body references and Claude loads only when the task reaches it. A PDF Skill keeps its quick start in the body and pushes the form filling guide into a separate file, trusting Claude to read it only when a form is actually involved. This is the same instinct as splitting a large module into focused files: you are managing the agent's attention, not just its file count.

pdf-processing/ SKILL.md # lean core, loaded on trigger reference.md # API detail, loaded on demand forms.md # form workflow, loaded only when filling forms scripts/ extract.py # executed as a tool, never read into context validate.py

Level 3: documents to read, code to run

The third level carries two different kinds of resource, and conflating them is a common architectural error. Reference documents are written for Claude to read into context when a task needs that detail, so a comprehensive API spec or a large style guide can live here at no cost until the moment it is opened. Scripts are written for Claude to execute, not to read. A sorting routine, a validator, or a form field extractor returns a deterministic result without ever consuming context, which is exactly what you want for any operation where reliability matters more than reasoning.

A useful rule when authoring is to be explicit in the body about which is which. State plainly whether Claude should run a script or read a file, because the same Python file can serve as an executable tool in one Skill and as documentation in another. The architectural payoff is that you can bundle very large resources, complete reference manuals, extensive examples, sizable datasets, with no startup penalty, because none of it touches the context window until a task actually requires it.

HatchWorks AI is an Official Anthropic Claude Partner. Our Anthropic-certified Forward Deployed Engineers deploy Claude into your business and make it stick.

See how our FDEs work →

The decision boundary: Skill, prompt, Project, MCP, or subagent

The most consequential architecture work is not inside a single Skill. It is deciding what should be a Skill at all. The Claude stack offers five building blocks that beginners routinely confuse because they all make Claude more capable. They operate at different layers, and putting knowledge in the wrong one is how teams end up rewriting a ten minute Skill as a forty hour service, or building a Skill that tries to query a database it can never reach.

Which best describes what you are trying to do?

I am giving a one-time instruction for the task in front of me.

I need this done the same correct way every time, across conversations.

Claude needs standing background context about one specific initiative.

Claude needs to reach live data or act on an external system.

I need isolated or parallel execution with its own tools and context.

Building block	What it is	Lifespan	Reach for it when	Architectural role
Prompt	A one-time, conversational instruction	Ephemeral, single turn	You are giving immediate guidance or refining a result	Reactive direction
Skill	A folder of reusable instructions, resources, and scripts	Persistent, across conversations	You need a task done the same correct way every time	Method, the how
Project	A workspace with background knowledge and instructions	Persistent, scoped to one initiative	Claude needs standing context about a body of work	Context, the what
MCP server	A live connection to an external tool or data source	A running service	Claude needs to query, fetch, or act on external state	Connectivity
Subagent	A specialized agent with its own context and permissions	Per delegated task	You need isolation, independent execution, or parallelism	Division of labor

Two boundaries cause most of the confusion. The first is Skill versus MCP. If your instructions contain the words query, fetch, look up, or current state, you almost certainly need an MCP server, because a Skill is procedural knowledge and cannot reach data that does not already exist in context. MCP gets Claude to the data; a Skill tells Claude what to do with it once it is there. The two are partners, not alternatives. The second is Skill versus subagent. If the same expertise is needed by more than one agent or conversation, it belongs in a Skill rather than baked into a subagent's system prompt. The clean formulation is that subagents are the specialists and Skills are the shared library of expertise they draw on. Our deeper treatment of Claude sub-agents and agent teams covers that delegation model in full.

A fast heuristic: a prompt is a one-time instruction, a Skill is a repeatable method, a Project is persistent background context, an MCP server is a connection to external state, and a subagent is a dedicated worker with isolated context. If you have copied the same instruction block more than twice, that is the signal to promote it from prompt to Skill.

Composition: the layers are meant to stack

The reason the boundary matters is that a production system rarely uses one block in isolation. The blocks are layers, and the strongest architectures compose them. A single competitive analysis request can pass through all five.

Project

Standing context

Holds the research and instructions for this initiative.

MCP

Live data

Pulls current briefs from a drive and data from a repository.

Skill

The framework

Applies the analytical method so every analysis matches.

Subagents

Parallel work

Run discrete pieces at once, each loading the Skill it needs.

Prompt

Refinement

Steers the result for the case in front of you.

Result

One analysis

Consistent, sourced, and shaped to the question.

Each layer does something the others genuinely cannot. This composition is also where the portability of Skills pays off. The same Skill works across Claude.ai, Claude Code, and the API without modification, provided the environment supports it, which means a methodology you encode once becomes available to every agent and surface your team operates. Subagents can load Skills, Agent Teams can assign different Skills to different members, and an MCP integration becomes more complete when a Skill captures the workflow for using it well. For the surfaces side of this picture, our guides to the Claude Agent SDK and managed agents and to building agents with Claude show where Skills slot into a running harness.

Common architecture mistakes

The failure modes are predictable, and most trace back to putting knowledge at the wrong level or in the wrong block.

A weak description that never triggers

The most common and most invisible failure. The Skill is well written but its description does not name the conditions or vocabulary that would make Claude select it, so it sits unused while the agent improvises. Fix it at Level 1 by stating both what the Skill does and when to use it, in the user's words.

A bloated body that taxes every task

Stuffing every detail into SKILL.md spends context on every invocation, even tasks that need none of it. Keep the body lean, target the 500-line and roughly 1,500 to 2,000 word guidance, and move detail into referenced files.

A Skill that tries to reach live data

If a Skill instructs Claude to look up or fetch current state, it will produce confident guesses rather than facts, because procedural knowledge cannot reach data it does not have. That is an MCP responsibility. Pair the Skill with a connection instead of asking it to do connectivity's job.

Expertise hardcoded into a subagent

Baking a review checklist or analysis method into one subagent's prompt traps it there. The moment a second agent needs the same expertise you are maintaining two copies. Put shared method in a Skill and let any subagent load it.

Domain knowledge living in system prompts

A system prompt is ephemeral and tied to a single deployment. If the same expertise needs to travel across Claude.ai, Claude Code, the API, and multiple agents, it should be a Skill, which is portable by design.

From a system of Skills to an organizational methodology

Once a team is designing Skills well, a larger pattern comes into view. A mature Skill does not store facts about a single case. It stores method: the search strategy, the analysis framework, the evidence standard, the output format. Facts come from Projects and MCP. Method is what Skills codify, and codified method is what turns individual productivity into organizational capability that is consistent, auditable, and reusable across every team and surface.

That is precisely the line between a useful tool and a durable advantage, and it is where a deliberate methodology earns its place. At HatchWorks, our Generative-Driven Development approach treats this codification as a first-class discipline. Where a Skill packages method for one capability, GenDD Context Packs package the standing context and conventions an agent needs to work the way your organization actually works, so progressive disclosure stops being a per-Skill tactic and becomes a system-level design principle across your entire delivery process. Skills are the unit. A methodology is what makes the units add up.

You've seen how to architect Skills into a system that composes, routes, and scales. An FDE turns that architecture into a production capability layer your team uses every day, not a folder of prompts that goes stale.

Official Anthropic Claude Partner

Part of the Claude Partner Network, HatchWorks AI embeds Anthropic-certified Forward Deployed Engineers in your team to find where Claude delivers value, ship it into production, and help make adoption stick.

Talk to a Forward Deployed Engineer See how FDEs work

Category: Claude
Tags: agent architecture, Agentic AI, AI agents, Anthropic, Claude, Claude Skills, LLM, MCP, progressive disclosure, prompt engineering

Get the best of our content
straight to your inbox!

Don’t worry, we don’t spam!

The VC’s Lens: How AI Is Rewriting the Rules of Defensibility

Claude Goal Automation: From Doing Tasks to Reaching Goals

Claude Agent SDK: Build Your Own Agent Harness

Claude Code: The Agent Harness, Off the Shelf

Publications