Anthropic Was Right About Agent Skills
Prompts don’t make agents reliable. Structure does. A deep dive into what Anthropic got right with Agent Skills.
Why prompts were never enough for production agents
Here's something that most people building AI agents run into:
Ask the same agent the same question twice => You don't always get the same answer
Nothing changed. Same input, but different outputs.
This is the nature of large language models.
LLMs are probabilistic systems. They don't guarantee identical outputs for identical inputs, especially when you introduce multi-step reasoning, tools, or long conversations.
This explains why so many "impressive" agents quietly fall apart in production.
The problem with Prompt-Only Agents
The industry's first instinct was obvious:
Add a better system prompt.
So we did:
- Longer Instructions
- Stricter rules
- Examples packed into context
System Prompts started looking like mini textbooks. While this approach has some impact, it also has two hard limits.
1. Prompts Don't Enforce Behavior
Prompts suggest what an agent should do. They don't encode what it must do.
If an agent is asked to
Generate a promo video script
A prompt-only agent improvises:
- structure
- tone
- CTA
- priorities
Run it twice and you'll often get two very different answers.
The agent isn't wrong. It's inferring, because nothing tells it how this task is actually done.
2. More Context ≠ More Reliability
When prompts fail, teams add more context.
But context windows are fragile:
-
too much text
-
too many competing instructions
-
irrelevant details always loaded
Eventually:
-
token usage explodes
-
reasoning degrades
-
reliability still doesn’t improve
At some point, the problem becomes how knowledge is structured and loaded.
The Gap Everyone Missed

LLMs are excellent at general knowledge.
They know:
-
what React is
-
what marketing is
-
what a promo video looks like
But production systems require procedural knowledge:
-
how your team writes React
-
how your company structures promos
-
how decisions are made, step by step
This gap, general knowledge vs procedural expertise, is where most agents fail.
What Anthropic Got Right
In their engineering blog post, Equipping agents for the real world with Agent Skills, Anthropic didn't try to solve this by "prompting harder".
They changed the unit of intelligence.
Instead of putting everything into one prompt, they introduced Agent Skills: structured packages that represent real workflows, not just instructions.
A Skill is a representation of how work is actually done.
Progressive Disclosure (The Core Insight)
The most important part of Anthropic's approach is progressive disclosure.
Rather than loading everything all the time, Skills are accessed in layers:
- Discovery
-
Name + description
-
Minimal tokens
-
Always present
- Instructions
- Full workflow
- Resources
-
Scripts, templates, references
-
Accessed only when needed
This design does something subtle but critical:
It keeps the model's context small without sacrificing depth.
Why This Changes Agent Reliability
Let's use a simple example.
Task: Generate a promo video script
Without Skills:
- the agent invents structure
- output varies between runs
- no enforced format
- no consistent decision logic
With a Skill:
- the structure is predefined
- the workflow is explicit
- edge cases are handled consistently
The agent isn't "thinking harder".
It's executing a known procedure.
That's the difference between:
- an agent that sounds smart
- an agent that behaves predictably
The Real Shift Anthropic Made
Anthropic didn't try to make agents smarter.
They made them more reliable.
They accepted the reality that:
- LLMs are non-deterministic
- prompts are advisory
- context is fragile
And designed a system that works with those constraints instead of fighting them.
Why This Matters
Most agents today fail for the same reason: they rely on text to do the job of structure.
Agent Skills showed a different path:
- encode workflows
- load knowledge conditionally
- separate behavior from conversation
Rather than a prompt optimization.
It was an architectural correction.
Conclusion
The industry tried to solve probabilistic systems with more text.
Anthropic didn't.
They recognized that reliable agents come from structured, procedural knowledge.
That insight is why Agent Skills matter, and why they've since been published as an open standard for cross-platform portability.
And it's why, going forward, agents that work in production will look very different from the ones we're building today.