Originally published at kunalganglani.com — read it there for inline code, hero image, and live links.
What Is Loop Engineering?
Loop engineering is the practice of writing persistent instructions, skills, subagents, and hooks that make AI agents iterate autonomously — gathering context, taking action, verifying results, and course-correcting — instead of stopping after a single response. It replaces one-shot prompting with self-correcting agent loops that run until a measurable goal is met.
If you've been using Claude Code to ask questions and paste answers into your editor, you're leaving 90% of its capability on the table. Claude Code shipped a full agentic AI operating system — skills, subagents, hooks, dynamic workflows, headless CI mode — and most developers haven't opened any of it.
I've spent the last several months rebuilding how my team works with Claude Code. The shift from "ask a question, get an answer" to "define a loop, let Claude run until tests pass" has been the single biggest productivity unlock I've experienced since adopting AI coding tools. Not a marginal improvement. A fundamentally different way of working. This guide walks you through the concrete primitives that make loop engineering work, drawn from Anthropic's official documentation and from what I've actually shipped with them.
Why Loop Engineering Matters More Than Better Prompts
Here's the thing nobody's saying about prompt engineering: it has a ceiling. You can craft the perfect single-shot prompt, and Claude will give you a great first attempt. But software engineering isn't about first attempts. It's about iteration. Write code, run tests, see failures, fix failures, run tests again. That's the actual workflow.
Jeel Vankhede, a software engineer writing on Dev.to, nailed this distinction: most engineers use AI as a Q&A tool rather than engineering with AI as an autonomous agent integrated into their workflow. The gap between those two modes is enormous.
The Anthropic docs describe Claude Code's agentic loop as three phases that blend together and repeat: gather context, take action, and verify results. A bug fix cycles through all three repeatedly. A refactor might involve extensive verification. Claude decides what each step requires based on what it learned from the previous step. It chains dozens of actions together and course-corrects along the way.
This is fundamentally different from prompting. Prompting is a request-response pattern. Loop engineering is designing the system that lets Claude keep going.
After shipping features with both approaches, I can tell you: the loop-engineered version catches edge cases the one-shot version misses every single time. Not because the model is smarter. Because it gets more attempts and more feedback.
The Single Prompt Pattern That Turns One-Shot Into a Loop
Before getting into skills, subagents, and hooks, there's one pattern that transforms any prompt into a loop. Anthropic's prompt library spells it out, and it's almost embarrassingly simple:
"Give it a way to check its own work — ask for run, test, compare, or verify in the same prompt so Claude iterates instead of stopping after one attempt."
In practice, it's the difference between this:
"Write a database migration to add a status column to the orders table."
And this:
"Write the migration, run it against the dev database, and confirm the schema matches the Order model."
That second prompt creates a loop. Claude writes the migration, runs it, checks the result, and if the schema doesn't match, it fixes the migration and tries again. The first prompt produces a file. The second prompt produces a working system.
I've distilled five patterns from Anthropic's docs that enable loop engineering in any prompt:
- Describe the outcome, not the steps — let Claude figure out the path
- Include a verification command — "run tests", "compare output", "check the build"
- Point at a reference — an existing file, pattern, or spec that Claude can compare against
- State a measurable target — "get the bundle size under 200KB and show me what you removed"
- Give it the artifact — paste errors, logs, screenshots so Claude has real feedback to iterate on
This is the foundation. Everything else in loop engineering builds on this pattern of embedding verification into the instruction.
CLAUDE.md: Your Loop's Operating System
Every Claude Code session starts with a fresh context window. Without persistent instructions, every session forgets your conventions, your test commands, your deployment process. CLAUDE.md files solve this.
Anthropics's memory docs lay out how CLAUDE.md files work: they're loaded at the start of every session into the context window, encoding persistent workflow instructions, loop behaviors, team conventions, and environment setup.
Think of CLAUDE.md as your loop's operating system. It tells Claude how to iterate in your specific codebase. Here's what an effective loop-engineering CLAUDE.md includes:
-
Verification commands: "Always run
npm testafter modifying any file in/src. If tests fail, fix the code and re-run until green." -
Quality gates: "Never commit code with TypeScript errors. Run
tsc --noEmitbefore considering any task complete." - Loop termination criteria: "A task is done when tests pass, linting is clean, and the PR description is written."
- Environment context: what database to use, which branch conventions to follow, where secrets live.
You can scope rules to specific file types using .claude/rules/ subdirectories. A rule scoped to *.tsx files might enforce component patterns. A rule scoped to *.sql files might enforce migration conventions. Different parts of your codebase need different loop behaviors, and this gives you that granularity.
I've seen teams dump their entire style guide into CLAUDE.md and wonder why Claude ignores half of it. This is wrong. The more specific and concise your instructions, the more consistently Claude follows them. Treat it like production configuration, not a wiki page. If your CLAUDE.md reads like a novel, you've already lost.
The hierarchy matters too: CLAUDE.md supports project-level, directory-level, and user-level scoping. Project-level for team conventions, directory-level for module-specific rules, user-level for personal preferences across all projects. Having built systems across multiple repositories, I can say this three-tier approach maps perfectly to how engineering organizations actually structure decisions. Your team agrees on conventions at the top, individual modules enforce their own patterns, and each developer gets their quirks respected.
Skills: Reusable Loop Engineering Components
Once you've embedded loop behavior into CLAUDE.md, the next step is extracting repeatable procedures into skills. Anthropic's skills docs explain the mechanics: place a SKILL.md file in .claude/skills/. The file name becomes the slash command — a file at .claude/skills/deploy/SKILL.md creates /deploy.
Here's what makes skills different from CLAUDE.md for loop engineering: a skill's body loads only when invoked. You can create detailed, multi-step procedures without paying a context-window cost until you actually need them. This matters a lot more than it sounds.
Three skills every team should build:
A /loop skill that wraps any task in an iteration cycle. Its instructions tell Claude: "Execute the task. Run verification. If verification fails, analyze the failure, fix the code, and verify again. Repeat until all checks pass or you've made 5 attempts." I've shipped enough features to know that the attempt cap is critical. Without it, Claude will chase its tail on genuinely unsolvable problems.
A /review skill that adds an adversarial review step. After Claude finishes implementing, it switches roles and reviews its own work with a critical eye — looking for edge cases, performance issues, and security vulnerabilities. This catches a surprising amount of stuff. Not everything, but more than you'd expect.
A /deploy skill that chains together your full deployment process: run tests, build, verify the build output, deploy to staging, run smoke tests, and only then open a PR for production.
The skills frontmatter system is what makes this genuinely powerful. Each skill can specify:
- Model selection: use a faster, cheaper model for iteration cycles and the most capable model for final review
- Tool whitelisting/blacklisting: restrict what Claude can do during specific phases
- Subagent delegation: automatically run the skill in an isolated context
- Pre-approved tools: bypass permission prompts for trusted operations
-
Argument passing via
$ARGUMENTS: make skills composable and reusable
So your /loop skill can burn through a cheap model for rapid iteration, while your /review skill calls in the heavy artillery for thorough analysis. You're building an agent orchestration system, not just writing prompts.
Subagents: Parallel Loops That Don't Collapse Your Context
Here's the constraint that kills most agentic workflows: context window exhaustion. Anthropic's best practices are clear on this — Claude Code's performance degrades as the context fills. This is the primary failure mode for agent loops. I've watched it happen in real time: a loop that worked beautifully for 20 minutes starts producing nonsense at minute 40.
Subagents solve this. Each subagent runs in its own isolated context window with a custom system prompt, specific tool access, and independent permissions. When a side task would flood your main conversation with search results, logs, or file contents you won't reference again, the subagent does that work in its own sandbox and returns only the summary.
Patterns I've found effective with subagents:
Isolate high-volume operations. When Claude needs to grep through thousands of files or parse lengthy logs, spawn a subagent. It does the heavy lifting and returns a concise answer. Your main context stays clean.
Fan-out architecture. Claude spawns multiple parallel workers — each in their own worktree so concurrent edits don't collide — and aggregates results. I've used this for large-scale refactors where ten files need the same pattern applied. Each subagent handles one file, verifies its own work, and reports back. It's the kind of thing that would take an hour manually and takes minutes with parallel subagents.
Chained subagents. The output of one subagent feeds into the next. A research subagent investigates the problem, a planning subagent designs the solution, an implementation subagent writes the code, and a verification subagent tests it. Each gets a fresh context window optimized for its phase.
Nested subagents. A subagent can spawn its own subagents. This enables recursive decomposition: Claude breaks a large problem into subproblems, each handled by a dedicated agent that might further decompose its own work.
If you've read about multi-agent AI systems in production, this is the Claude Code-native way to implement them. No external framework required. No LangChain, no CrewAI. Just SKILL.md files and subagent definitions that live in your repo alongside the code they operate on.
Hooks: Programmatic Control Over the Agent Loop
Skills and subagents define what Claude does. Hooks control when and how it does it. This is where loop engineering crosses from "clever prompting" into genuine agent framework territory.
Claude Code supports over 20 hook events in its lifecycle, per the hooks reference. The ones that matter most for loop engineering:
- PostToolUseFailure: fires when a tool call fails. You can re-inject context, suggest a fix, or redirect Claude's next action. Your automatic retry mechanism.
- Stop: fires before Claude ends a session. You can prevent premature termination — if tests haven't passed, the hook blocks Claude from stopping.
- TaskCompleted: fires when a delegated task finishes. Triggers the next step in a pipeline. True sequential automation.
- FileChanged: fires when a file is modified. Automatically run linters, tests, or formatters whenever Claude touches a file.
- PreToolUse: fires before Claude executes a tool. Block dangerous operations, require confirmation for destructive actions, or redirect to safer alternatives.
Hooks can emit JSON to add context for Claude or exercise decision control — block, approve, or redirect Claude's next action. This is the mechanism that turns Claude Code from an interactive tool into an autonomous system.
I've been working with hooks in production for several months now, and I'll give you the single most impactful pattern: a Stop hook that checks test results. If Claude tries to stop and tests haven't passed, the hook injects the test failures back into context and tells Claude to keep going. This single hook eliminated the most common failure mode I saw. Claude loves to declare victory before the code actually works. The Stop hook says "no, you're not done."
Combine that with a PostToolUseFailure hook that provides diagnostic context when commands fail, and you get a self-healing loop. Claude hits an error, gets relevant context about what went wrong, fixes it, continues. No human intervention needed.
Loop Engineering in CI/CD: Headless Agent Loops
Everything so far assumes you're sitting at your terminal watching Claude work. The real power shows up when Claude runs without you.
Claude Code supports non-interactive / headless mode and can be piped into scripts for CI/CD and batch processing. Combined with auto mode, Claude can be triggered on a schedule or by external events — fully autonomous agent loops with no human in the loop.
Practical applications I've seen work well:
- PR review loops: Claude Code runs on every PR, reviews the diff, runs tests, posts comments. If it finds issues, it opens fix commits automatically.
- Dependency update loops: Claude updates dependencies, runs the full test suite, and only opens a PR if everything passes. If tests break, it investigates and fixes compatibility issues before anyone sees the PR.
- Migration loops: Claude runs a database migration, verifies the schema, runs integration tests against the new schema, and rolls back if anything fails.
- Documentation loops: When source code changes, Claude updates the relevant docs, verifies links aren't broken, and commits the result.
The key for CI/CD loops: scope Claude's permissions tightly. Use hook-based LLM security controls to prevent destructive operations, restrict file access to relevant directories, and require that Claude creates branches rather than pushing to main. I've learned this the hard way. An autonomous agent with write access to main is a story that ends badly.
This is where loop engineering connects to production AI. You're not just using Claude as a development tool. You're embedding it into your engineering infrastructure as an autonomous worker with defined responsibilities and guardrails.
Context Window Management: Why Loops Fail and How to Fix It
I promised this guide would be practical, so let me talk about the thing that actually breaks agent loops: context exhaustion.
Claude's context window fills up fast. When it does, performance degrades. Your carefully designed loop starts producing garbage because Claude literally can't remember what it was doing. I've watched this happen with loops that work perfectly on small tasks and catastrophically fail on large ones. It's the most frustrating debugging experience because the system looks correct — it just runs out of room.
Best practices I've arrived at for keeping loops running cleanly:
Use subagents aggressively. Every side investigation, every log analysis, every research tangent should run in a subagent. Your main context stays focused on the primary task. I treat this as the default now. If a step doesn't directly advance the main goal, it goes in a subagent.
Use /compact strategically. When you notice Claude's responses getting less coherent, compact the context. This summarizes the conversation so far, freeing up space for new work.
Design loops with fresh contexts per phase. Instead of one long loop, chain subagents where each phase starts clean. The planning subagent passes a structured plan to the implementation subagent, which passes results to the verification subagent. Each gets maximum context for its job.
Keep CLAUDE.md concise. Every word in CLAUDE.md eats into your working context. I've seen teams with 2,000-word CLAUDE.md files wondering why Claude forgets instructions halfway through a task. Cut it down. Aim for the minimum viable configuration.
Scope skills narrowly. The skill frontmatter system lets you restrict which tools a skill can use. Fewer tools means less context spent on tool definitions, leaving more room for actual work.
This is the same challenge you face with any large language model in production. The context window is a finite resource. Loop engineering that ignores this constraint fails. Loop engineering that designs around it succeeds.
Loop Engineering vs Single-Shot Prompting
To make the difference concrete:
| Dimension | Single-Shot Prompting | Loop Engineering |
|---|---|---|
| Interaction model | One request, one response | Continuous iteration until goal met |
| Error handling | Manual — you read the error and re-prompt | Automatic — hooks re-inject context on failure |
| Verification | You check the output | Claude checks its own output |
| Context management | Entire conversation in one window | Subagents isolate phases into fresh contexts |
| Reusability | Copy-paste prompts between sessions | Skills persist as versioned files in your repo |
| CI/CD integration | Not possible | Headless mode enables autonomous pipelines |
| Team scaling | Each person writes their own prompts | Skills and CLAUDE.md encode team knowledge |
| Failure mode | Claude stops and you restart | Hooks prevent premature termination |
Single-shot prompting is using Claude as a search engine with better prose. Loop engineering is using Claude as an autonomous teammate. The difference in output quality isn't marginal. It's a category change.
kaleman15, writing on Dev.to, describes a skill progression arc from basic prompting to full delegation — what they call "gym badges of agentic engineering." Loop engineering sits at the top of that arc. It's where you stop doing the work and start designing the system that does the work.
Getting Started: Your First Agent Loop in 30 Minutes
Here's how to go from zero to a working agent loop:
Create a CLAUDE.md in your project root. Include your test command, your build command, and one rule: "Always run tests after modifying source files. If tests fail, fix and re-run." That's it. Don't overthink the first version.
Create your first skill at
.claude/skills/loop/SKILL.md. Instructions: execute a task, verify with tests, iterate until green. Keep it under 200 words.Add a Stop hook that checks whether the last test run passed. If not, inject the failures back into context with a message: "Tests are still failing. Continue fixing." This one hook alone will change how Claude behaves.
Test it with a real but small task: "Fix the failing test in
user_service_test.py." Watch Claude iterate. See how many cycles it takes. You'll probably be surprised.Create a subagent for investigation. When Claude needs to understand how a function is used across the codebase, it spawns a research subagent instead of flooding the main context with grep results.
Add a
/reviewskill that tells Claude to review its own changes critically before considering the task complete.
That's your starter kit. Six components, thirty minutes, and you've fundamentally changed how Claude works in your codebase. Every one of these lives as a file in your repo — versioned, reviewed, shared with your team.
If you're already comfortable with Claude Code and want to go deeper, Anthropic's subagents documentation covers advanced patterns like nested subagents and fork-based parallel execution. The hooks reference has over 20 lifecycle events. There's likely one for whatever control flow you need.
What Loop Engineering Means for What Comes Next
The developer community is already splitting into two groups: people who use AI and people who engineer with AI. The tools are here. The documentation is thorough. The only thing missing is adoption.
My prediction: by the end of 2026, the engineers who've mastered loop engineering will be operating at 3-5x the throughput of those still doing single-shot prompting. Not because they're smarter or faster, but because they've built systems that iterate while they think about the next problem.
This is one of those things where the boring answer is actually the right one. Loop engineering isn't a revolutionary new concept. It's applying the same principles we've always used in software engineering — automation, feedback loops, verification, modular design — to how we work with AI agents. The tools just finally caught up to the idea.
Stop prompting. Start engineering loops.
Frequently Asked Questions
What is the difference between loop engineering and prompt engineering?
Prompt engineering focuses on crafting a single input to get the best possible single output from an LLM. Loop engineering goes further — it designs persistent systems (skills, hooks, CLAUDE.md files) that make Claude iterate autonomously through multiple cycles of action and verification until a measurable goal is met. Think of it as the difference between writing one good email and building an automated workflow.
How does Claude Code's agentic loop actually work?
Claude Code operates in three repeating phases: gather context (read files, search code), take action (edit files, run commands), and verify results (check tests, compare output). These phases blend together and repeat, with Claude chaining dozens of actions and course-correcting based on what it learns at each step. A single bug fix might cycle through all three phases multiple times.
Can Claude Code run autonomously in CI/CD pipelines?
Yes. Claude Code supports headless (non-interactive) mode and can be piped into scripts. Combined with auto mode, it can run on a schedule or be triggered by events like pull requests or dependency updates. You should scope permissions tightly and use hooks to prevent destructive operations when running without human oversight.
What is the biggest challenge with agent loops in Claude Code?
Context window exhaustion. Claude's performance degrades as the context fills up, which is especially problematic for long-running loops. The main mitigation strategies are: using subagents to isolate heavy operations into separate contexts, running /compact to summarize conversations, keeping CLAUDE.md concise, and designing loops where each phase starts in a fresh context.
How do hooks enable loop engineering in Claude Code?
Hooks are event-driven scripts that fire at specific points in Claude's lifecycle. For loop engineering, the most important hooks are PostToolUseFailure (re-inject context when something breaks), Stop (prevent Claude from ending before tests pass), and TaskCompleted (trigger the next pipeline step). Hooks can block, approve, or redirect Claude's actions programmatically.
Do I need external frameworks like LangChain to build agent loops with Claude Code?
No. Claude Code has native support for skills (reusable instruction sets), subagents (isolated parallel workers), and hooks (event-driven automation). These primitives give you full agent orchestration without any external dependencies. Everything is defined in markdown and JSON files that live in your repository alongside your code.
Originally published on kunalganglani.com
![Loop Engineering: Stop Prompting, Start Building Agent Loops [2026]](https://media2.dev.to/dynamic/image/width=1200,height=627,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F6zwywu1qgh4ney84wbsg.png)











