Spec-Driven Development With AI Agents: From “Prompt and Pray” to “Specify and Ship”
The biggest shift in AI-assisted development isn't a new model. It's a new workflow. Spec-driven development — where the specification is the source of truth and code is a generated artifact — is rapidly replacing ad-hoc prompting as the way serious teams work with AI agents. Here's what it is, why it works, and how to adopt it today.
The Problem With “Prompt and Pray”
Most developers are still working with AI agents like this: type a prompt, watch the agent generate code, review the output, fix what's wrong, repeat. This is “prompt and pray” — you give a vague instruction and hope the agent interprets it correctly.
Sometimes it works beautifully. Often it doesn't. The agent makes assumptions about your architecture. It invents data models you didn't want. It restructures things you didn't ask to be restructured. And because there's no specification, there's no way to objectively say “this is wrong” — because wrong relative to what?
The deeper problem: without a spec, you can't verify. You can only eyeball the output and use your judgment. For a 20-line function, that works. For a 400-line feature touching 12 files? Good luck catching every assumption the agent made.
What Is Spec-Driven Development?
Spec-driven development (SDD) inverts the workflow. Instead of starting with code, you start with a specification — a clear, structured document that defines what the code should do, how it should behave, and what constraints it must respect. The spec becomes the source of truth. Code is a generated artifact that implements the spec.
The rule is simple: if the code and the spec disagree, you fix the code, not the spec.
This isn't new — specification-first approaches have existed for decades. What's new is that AI agents make it practical. As Martin Fowler's analysis of SDD tools and Augment Code's complete guide both highlight: writing a spec used to feel like overhead because you still had to write all the code yourself. Now the agent writes the code. Your job is writing the spec and reviewing the output. And writing a spec is faster, easier, and more leveraged than writing code.
As GitHub's Spec Kit documentation puts it: “Your primary role is to steer; the coding agent does the bulk of the writing.”
The Three Phases of Spec-Driven Development
Whether you use GitHub's Spec Kit, Amazon's Kiro, or any other SDD framework, the core workflow has three phases. Archie implements all three with a twist: human approval between each.
Phase 1: Specify
You provide a high-level description of what you want. The AI agent generates a detailed specification: what components are affected, what data model changes are needed, what the API contract looks like, what the edge cases and risks are, and what's explicitly out of scope.
This is the design phase. In Archie, it's the /architect skill. The output is a spec — not code. You review the spec, not a PR. Reviewing a design takes 2-3 minutes and catches the mistakes that would cost 2-3 hours if they made it into code.
Addy Osmani's guide on writing good specs for AI agents highlights a critical insight: the spec should be a living document that evolves as you and the agent make decisions. It's not a waterfall artifact you write once and hand off — it's a contract you iterate on together.
Phase 2: Decompose
The agent takes the approved spec and breaks it into small, implementable tasks. Each task has clear boundaries: one service, a handful of files, specific done-when criteria. The task decomposition is itself a spec — a plan that you review before implementation begins.
In Archie, this is the /tech-lead skill. The output is a backlog of tasks, each scoped tightly enough to produce a reviewable PR. You approve the decomposition, and the tasks become the agent's implementation roadmap.
This phase is what separates SDD from just “writing a longer prompt.” Breaking a spec into tasks creates natural checkpoints. Each task can be implemented, tested, and reviewed independently. If one task reveals a flaw in the spec, you update the spec and the remaining tasks — you don't throw away all the code.
Phase 3: Implement
The agent implements one task at a time. Each task produces a focused PR: a branch, scoped code changes, tests that verify the task's done-when criteria, and a conventional commit. In Archie, this is the /dev-agent skill.
The implementation phase is where the agent earns its keep. It writes the code, the tests, and the commit messages. You review the PR — and because the task is small and the spec is clear, the review takes minutes, not hours.
Get all 16 free CLAUDE.md templates + cheat sheets
Enterprise-grade conventions for every major stack, plus Claude Code and prompt engineering guides. No account needed.
Where Archie Fits in the SDD Landscape
The spec-driven development space is exploding. GitHub Spec Kit has 72K+ stars. Amazon built Kiro around the concept. Thoughtworks unpacked SDD as a key engineering practice and put it on the Technology Radar. Vishal Mysore mapped 30+ agentic coding frameworks implementing some version of this workflow.
Archie's approach has three differences from most SDD tools:
1. It lives in your repo, not in a platform. Archie is a set of files — markdown, JSON, and skill definitions — that you drop into your project root. No SaaS platform, no API keys, no vendor lock-in. The spec, the memory, the tasks, and the workflow are all in your git history. Your team owns them.
2. It includes persistent memory. Most SDD tools handle the specify → decompose → implement pipeline. Archie adds memory that persists across sessions: your architecture, past decisions, known issues, and conventions. The spec phase is better because the agent already knows your system. It doesn't propose a MongoDB data model when your entire project is on Postgres.
3. Human approval is built into every transition. Every phase boundary has an approval gate. The agent proposes a spec, you approve. The agent proposes a task breakdown, you approve. This isn't optional ceremony — it's the mechanism that prevents the agent from implementing a wrong spec at full speed. The Thoughtworks Technology Radar warns about “overly rigid, opinionated processes” in SDD tools. Archie's gates are the opposite — they're conversational. “Looks good,” “change the database approach,” “scope out billing for v1.” Natural language, not ceremony.
SDD in Practice: A Worked Example
Let's walk through spec-driven development on a real feature: adding team invitation functionality to a SaaS app.
The Prompt (10 seconds)
/architect
“We need a team invitation system. Team admins can invite users by email. Invitations expire after 7 days. Invited users get an email with a link to accept.”
The Spec (generated in ~60 seconds)
The agent produces a specification that covers: a new team_invitations table with fields for email, token, expiry, and status; three API endpoints (create, accept, list pending); an email notification via the existing Resend integration; frontend components for the invite modal and pending invitations list; edge cases (already-a-member, expired, duplicate email); and an explicit out-of-scope note: “No role selection in v1 — all invitees get the 'member' role.”
You read the spec. It caught the role assignment question you would have forgotten about. The out-of-scope boundary prevents the agent from building a full RBAC system when you just need invitations. You say: “Approved. Also add a decline endpoint.”
The Task Breakdown (generated in ~30 seconds)
T-020: Database migration + Drizzle schema for team_invitations
T-021: API endpoints — create, accept, decline, list pending
T-022: Email notification with invitation link (Resend)
T-023: Frontend — invite modal + pending invitations list
Each task lists the files it will touch and the done-when criteria. T-021 depends on T-020. T-022 depends on T-021 (needs the invitation token). T-023 depends on T-021 (needs the API endpoints). T-020 has no dependencies. You approve.
Implementation (4 PRs over ~40 minutes)
The agent picks up T-020. Branch, migration, schema, tests. PR with 4 files. You review, merge. The agent picks up T-021. Endpoints, validation, error handling. PR with 7 files. You review, merge. And so on.
Total time: about 45 minutes including your review time. Four clean PRs. Each traceable to a task. Each task traceable to the spec. The spec traceable to the original requirement.
If someone asks “why did we build it this way?” in six months, the answer is in the spec. Not in a Slack thread. Not in someone's head. In the repo, in version control, in the spec the team approved.
“Is This Just Waterfall?”
This is the most common pushback against SDD. It looks like waterfall: spec first, then implement. But there are crucial differences.
Waterfall specs take weeks. SDD specs take minutes. The agent generates the spec in 60 seconds. You iterate on it in 2-3 minutes. The total design phase is under 5 minutes for most features. This is nothing like the multi-week requirements-gathering process of traditional waterfall.
Waterfall specs are rigid. SDD specs are living documents. If implementation reveals a flaw in the spec, you update the spec. If a task turns out to be more complex than expected, you re-decompose. The spec evolves with the code. The key constraint is that the spec leads — you update the spec first, then the code follows.
Waterfall delays feedback. SDD accelerates it. In waterfall, you don't see working code until the end. In SDD with small tasks, you see a working PR within minutes of the first task. Feedback is immediate. The loop is tight.
SDD isn't waterfall. It's the design-first ethos of good engineering, stripped of the ceremony, and accelerated by AI.
CLAUDE.md sets the rules. Archie runs the workflow.
Persistent memory, role-based skills, and approval gates. From idea to merged PR.
Getting Started With Spec-Driven Development
You don't need a framework to start doing SDD. You need a mindset shift: describe what you want before asking the agent to build it. Review the description. Then build.
But a framework helps. It turns the mindset into a repeatable process with built-in quality gates. Here's the progression:
Level 1 — Context file: Add a CLAUDE.md to your project with architecture, patterns, and boundaries. The agent's specs will immediately be better because it knows your system. (Free templates here)
Level 2 — Manual SDD: Before any “build this” prompt, ask the agent to “design this first.” Review the design. Then say “now break it into tasks.” Review the tasks. Then “implement task 1.” This gives you the SDD workflow with zero tooling.
Level 3 — Full SDD framework: Use a tool that bakes the workflow into skills, adds persistent memory, and manages the task lifecycle. That's what Archie does — /architect for the spec, /tech-lead for the decomposition, /dev-agent for the implementation, with approval gates between each.
The specification is the new source code. Start treating it that way.