Skip to content
·10 min read

Why Your AI Agent Goes Off the Rails (And How to Fix It)

Every developer using AI coding agents has had the experience: you give a clear instruction, the agent produces hundreds of lines of code, and it's all wrong. Not syntactically wrong — architecturally wrong. Here are the five patterns that cause this, and the structural fix for each.

Pattern 1: The Scope Creep

What happens: You ask the agent to add a date field to a form. It adds the field, but also restructures the form layout, refactors the validation logic, updates the API response format, and adds a feature you didn't ask for.

Why it happens: AI agents optimize for "completeness." When they see related code that could be improved, they improve it. They don't have a concept of task scope — everything is one big task.

The fix: Give the agent an explicit task scope with a file list. "Touch only these 3 files. If you find something else that should be fixed, log it as a separate task — don't fix it now." This is how we handle it: each task in the backlog has a defined file list and done-when criteria. The agent is instructed to not modify anything outside the task scope.

Pattern 2: The Context Amnesia

What happens: You start a new session, and the agent has no idea what happened yesterday. It re-proposes a solution you already rejected. It uses a library you decided against. It ignores a pattern you established last week.

Why it happens: AI agents have no memory between sessions. Every conversation starts fresh. The agent literally does not know what decisions you've made unless you tell it again.

The fix: Persistent memory files that the agent reads at the start of every session. Not a giant context dump — structured files for specific purposes: one for architecture, one for coding patterns, one for past decisions (ADRs), one for current work. The agent reads the relevant files and knows your context in 10 seconds instead of 10 minutes of re-explaining.

Get all 16 free CLAUDE.md templates + cheat sheets

Enterprise-grade conventions for every major stack, plus Claude Code and prompt engineering guides. No account needed.

Download free

Pattern 3: The Architecture Sprint

What happens: You ask for a feature, and the agent makes a significant architectural decision on its own — adds a message queue, introduces a new microservice, creates a caching layer — without checking if that's what you wanted.

Why it happens: The agent is trying to build the "best" solution. It doesn't know that your team has a policy against adding new infrastructure without a review. It doesn't know that you already have a Redis instance but don't want to add another caching layer.

The fix: Separate the design phase from the implementation phase, with an approval gate in between. The agent designs first — producing an architecture proposal with component impact, data model changes, and technical risks. You review this before any code is written. The agent cannot write code until you say "looks good."

This single change — design review before implementation — prevents more wasted hours than any other intervention. It's the difference between catching a wrong turn at the intersection versus catching it after you've driven 50 miles in the wrong direction.

Pattern 4: The Mega-PR

What happens: The agent produces a single PR with 500+ lines of changes across 15 files. You can't effectively review it. You approve it because you're busy, and subtle bugs slip through.

Why it happens: Without task decomposition, the agent treats the entire feature as one unit of work. It produces one PR because it did one task — even though that "task" was actually five separate concerns.

The fix: Break features into small, focused tasks before implementation. Each task targets one service, touches no more than 15 files, and has specific done-when criteria. The agent implements one task per session and opens one PR per task. The result: 5 small, reviewable PRs instead of 1 impossible one.

The 15-file limit is deliberate. Research on code review effectiveness shows that review quality drops dramatically above ~400 lines of diff. By capping files per task, you naturally keep diffs small and reviewable.

Pattern 5: The Dependency Tangle

What happens: The agent starts implementing a feature that depends on another feature that hasn't been built yet. It either builds a hacky workaround, or it builds the dependency too — expanding the scope and creating a tangled PR.

Why it happens: The agent doesn't have a dependency graph. It doesn't know that "add email reminders" depends on "add due dates to the data model" — it just tries to build everything at once.

The fix: Explicit dependency mapping in the task breakdown. Before implementation, the Tech Lead phase produces a dependency graph: which tasks can run in parallel, which must be sequential, and what must exist before each task can start. The dev agent checks for unmet dependencies before starting work and stops if something is missing.

The Common Thread

All five patterns have the same root cause: the AI agent is operating without structure. It doesn't have context (memory), it doesn't have boundaries (roles and scope), and it doesn't have checkpoints (approval gates).

The fix is not to use AI less — it's to give AI more structure. Think of it like hiring a brilliant junior developer who's incredibly fast but has never seen your codebase. You wouldn't say "go build the notification system" and walk away. You'd give them an onboarding doc, explain the architecture, have them write a design doc for review, and break the work into small PRs. That's exactly what structured AI development looks like.

Implementing This in Practice

We built Archie to codify these fixes into a drop-in framework. It gives your AI agent persistent memory (files it reads every session), role-based skills (Architect, Tech Lead, Dev Agent), and mandatory approval gates between phases.

But even if you don't use Archie, the principles apply. The minimum viable version is three things: a markdown file describing your architecture that the agent reads at the start of every session, a rule that says "describe what you're going to build before you build it," and a task list with small, scoped units of work. Just these three things will dramatically reduce the amount of time you spend cleaning up after your AI agent.