πŸ€– The Agentic Loop πŸ”„ Loop Engineering : A Practical Field Guide πŸ“˜
DEV Community

πŸ€– The Agentic Loop πŸ”„ Loop Engineering : A Practical Field Guide πŸ“˜

⚑ TL;DR

An agentic loop is the simplest unit of useful agent work: do something β†’ check the result β†’ decide whether to continue or stop. The whole craft is in making the check real and defining when to stop. Everything else - model choice, harness, MCPs, subagents - is secondary.

If you remember one sentence: A loop is a task with a check. A task without a check is just hope.

Zoom out and the same idea has a name: loop engineering - designing the system that prompts your agent on a schedule and against a goal, instead of typing every prompt yourself. As Anthropic's Boris Cherny put it, "My job is to write loops."

This guide takes you from one good loop to a self‑running one - and tells you where the brakes are.

1. πŸ”„ What an agentic loop actually is

Most people picture "an agent" as a chatbot that writes code in one shot. That's a one-time task. A loop is different. The agent:

  • Observes the current state (reads files, runs a test, takes a screenshot).
  • Takes one bounded action (changes one thing).
  • Checks what happened against a fixed standard.
  • Decides - continue, stop because it succeeded, or stop because it's blocked or out of budget.
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                           β”‚
β–Ό                                           β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”
β”‚ OBSERVE  │──▢│   ACT    │──▢│  CHECK   │──▢│  DECIDE  β”‚
β”‚ (inputs) β”‚   β”‚ (1 step) β”‚   β”‚ (fixed)  β”‚   β”‚ continue β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚ /stop?   β”‚
                                              β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
                                                   β”‚ stop
                                                   β–Ό
                                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                        β”‚ HANDOFF / REPORT  β”‚
                                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Use a loop when the result of one step should change the next step. If it won't, use a one-time task instead. (Forward Future, How agent loops work.)

This is why "improve the code" fails and "make every page load under 50ms under the same test conditions" works. The first has no finish line; the second has a check the agent can run after every change, and a number that says done.

πŸ” Inner loop vs. outer loop

The cycle above is the inner loop - what a coding agent already runs on every turn: it perceives the state, reasons about what to do, acts (calls a tool, edits a file, runs a test), observes the result, and reasons again. You don't build that; the harness does.

What you build is the outer loop: the system that runs that inner loop on a schedule, feeds it work, checks the result, and decides the next thing - without you typing each prompt.

Everything past this section is about designing that outer loop well.

2. πŸ”§ From prompting to loop engineering

In June 2026 this pattern got a name. Addy Osmani called it loop engineering, crystallizing what Peter Steinberger and Anthropic's Boris Cherny (head of Claude Code) had been saying:

"You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents." - Peter Steinberger

"I don't prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops." - Boris Cherny

It's the third layer in a stack that's been building for years. Each layer wraps the one inside it and moves the leverage point further from the raw model call:

Layer What you optimize Unit of work
Prompt engineering how you phrase one instruction one turn you type by hand
Context engineering what else is in the window: docs, history, tool defs the conditions around one answer
Loop engineering the system that decides what to prompt, when, and whether the result passes a self‑running cycle across many turns

The lower layers don't disappear - a sloppy prompt inside a loop just produces sloppy work faster, and the loop still has to put the right files in front of the model each turn. What loop engineering adds is the autonomous control structure around all of it.

The leverage moved; the work didn't get easier. A well‑designed loop multiplies a good engineer. A badly designed one multiplies a bad decision just as fast, with less of you watching.

3. 🐣 Where the loop began: the Ralph technique

Before it had a name, there was Ralph. In July 2025 Geoffrey Huntley described running a coding agent inside a plain while loop and named it after Ralph Wiggum - "deterministically simple in an unpredictable world." It looks too dumb to work, and it works. (Huntley built an entire programming language with it for about $297.)

# The original Ralph loop: same prompt, fresh context, until done
while ! grep -q "ALL TASKS DONE" STATUS.md ; do
  # each pass is a brand-new agent with an empty context window
  claude -p "Read PLAN.md and STATUS.md. Pick the next unchecked task,
    implement it, run the tests, commit on success, and update STATUS.md.
    Then stop."
done

The non‑obvious insight is the context reset. A long session degrades as the window fills with old reasoning, dead ends, and stale file contents. Ralph sidesteps that: every iteration is a fresh agent with a clean context that reads the current repo state and task list from disk, does exactly one unit of work, commits, and exits.

The intelligence doesn't live in one heroic run - it lives in clear, granular specs and verifiable outcomes, applied over and over against an external memory the model can't pollute.

Loop engineering is Ralph, productized. The while loop becomes a scheduled automation, the context reset becomes a worktree plus a sub‑agent, and the ALL TASKS DONE grep becomes a /goal condition graded by a separate model. Same shape, fewer sharp edges.

πŸ“Š The five-stage lineage

Ralph didn't appear from nowhere - and what Steinberger and Cherny mean today isn't Ralph either. The word loop hides at least five distinct things. Knowing where you are on this ladder is the fastest way to stop talking past people:

Stage When What it was What it added
1. ReAct 2022 the academic while‑loop: reason β†’ act β†’ observe β†’ repeat one model, one loop, a human watching
2. AutoGPT 2023 gave the loop a goal and let it prompt itself autonomy - and infamous infinite spinning
3. Ralph Jul 2025 a bash one‑liner piping the same prompt, fresh context each pass discipline: reset context to fixed anchor files
4. /goal spring 2026 Ralph productized in Codex & Claude Code; runs until a validator model confirms done a built‑in verifiable stop
5. Orchestration now loops supervising loops, on a schedule, with durable git‑backed state the loop becomes the unit of work

Stages 1–4 are single‑agent. Stage 5 is what's genuinely new:

  • The loop became the unit of work (not the task)
  • Loops started supervising other loops concurrently and on a schedule
  • Scheduling replaced the human kickoff (so it runs on infrastructure time, not your attention)
  • Durability became explicit (git‑backed state and crash recovery, because Ralph assumed your terminal stayed open and the 2026 version assumes it does not)

"It's just cron with a hat on" - half right.

The sharpest skeptic line in the whole discourse was four words: "Cronjobs have funny re‑branding right now." And yes, the scheduling layer is cron - Claude Code's /loop runs on cron under the hood. What cron never had is the body. A cron job runs a fixed script; a loop runs a model that reads the current state, decides what to do next, does it, checks whether it worked, and decides whether to continue. A loop is cron plus a decision‑maker in the body.

Stack those - let one loop dispatch and supervise others with durable shared state - and you get something cron can't express. The open‑source proof is Steve Yegge's Gas Town: 20–30 Claude Code instances coordinated by a "Mayor" agent, patrol agents running continuous loops, and state in git so work survives a crash.

πŸ—ΊοΈ What stage-5 orchestration looks like

Read it top to bottom: a scheduler tick wakes the Mayor (the outer loop), which hands each patrol agent one bounded task in its own worktree. Each patrol agent runs its own inner observe β†’ act β†’ check cycle, then a verifier gates the result - failures bounce back to the Mayor for rework, passes are committed to durable git state. The next tick reads that state and picks up where the last one stopped. The Mayor enforces the three hard stops (max iterations, no‑progress, budget) so the whole thing halts instead of running off a cliff.

4. πŸš€ Why this matters right now

The capability bar moved. Practitioners report that agentic coding went from "this is crap" to "this is good" around mid‑2025, and from good to "this is amazing" with the newest frontier coding models. The practical consequence, in Steinberger's words: the amount of software you can create is now mostly limited by inference time and hard thinking - not by typing.

That shifts where your effort goes. The bottleneck is no longer writing code; it's specifying the goal and the check precisely enough that an agent can run unattended and you can trust the result. The agentic loop is the format that encodes exactly those two things.

A second reason it matters: closing the loop. The recurring theme across every credible source is that agents get dramatically more reliable when they can verify their own work - run the CLI, run the test, diff the screenshot, hit the endpoint. Whatever you build, build it so the agent can check itself.

"By default, whatever I wanna build, it starts as a CLI. Agents can call it directly and verify output - closing the loop." (Steinberger)

5. πŸ—οΈ The anatomy of a good loop

Every reliable loop names five things explicitly. Miss one and the loop drifts, runs forever, or "succeeds" while tests fail.

Part Question it answers Failure if missing
Trigger When does the loop run? Never starts, or runs at the wrong time
Inputs What fresh state does the agent inspect each pass? Acts on stale assumptions
Action What single bounded, reversible change may it make? Huge blast radius, impossible to undo
Check What fixed test/benchmark/rubric decides success? "Looks done" while broken
Stop Success? No‑op? Blocked? Out of budget? Infinite loop, wasted tokens, runaway authority

πŸ“ The four design rules (from How agent loops work)

  1. Start with a measurable goal. Describe the result so you can review or measure it.
  2. Keep each action small. One bounded, reversible change at a time - easier to verify, easier to undo.
  3. Use a fixed check. Run the same test/benchmark/rubric/approval after every change. The check - not the agent's opinion - determines whether the work improved.
  4. Define how it stops. Success, no‑op, ask‑for‑approval, and blocked/out‑of‑budget must all be spelled out.

6. πŸ“ The universal loop template

This single prompt shape works across Cursor, Codex, Claude Code, Factory, Devin - anything. Fill the brackets:

When [trigger], inspect [fresh inputs]. Choose one in-scope action using
[criteria], then make the change. Run [acceptance check] under the same
conditions. Record what changed, the evidence, and the next step in
[state file]. Repeat only while progress is measurable and [budget]
remains. Stop when [success gate] passes. Stop without changes when
[no-op condition] is true. Ask for approval or report a blocker when
[escalation condition] occurs. Never [forbidden action]. Finish with
[pull request, report, artifact, or handoff].

Run it once by hand before you schedule it. The first manual run almost always reveals a missing check, a fuzzy boundary, or a stop condition that needs to be sharper. (Forward Future)

🍦 Two flavors

  • Goal loop - starts manually, runs until the check passes or the budget runs out. (e.g., "stabilize the test suite.")
  • Scheduled loop - starts on a timer or event, does its bounded work, reports, and waits for the next trigger. (e.g., Steinberger's five‑minute repository maintainer that wakes every five minutes, triages repos, assigns the highest‑value bounded task, and requires green CI before anything lands.)

7. 🧱 The five building blocks of a self-running loop

A year ago a loop meant a pile of bash you maintained forever. As of mid‑2026 the pieces ship inside the products - and the shape is the same across OpenAI Codex and Anthropic's Claude Code, so you stop arguing about which tool and just design a loop that works in either.

A loop needs five blocks plus one place to remember state.

# Block What it does In Codex In Claude Code
1 Automations scheduled discovery + triage Automations tab (project, prompt, cadence, env); Triage inbox; /goal /loop, scheduled tasks/cron, hooks, GitHub Actions, /goal
2 Worktrees isolate parallel agents built‑in worktree per thread git worktree, --worktree, isolation: worktree on a subagent
3 Skills codify project knowledge SKILL.md, called with $name or implicitly Agent Skills (SKILL.md)
4 Connectors reach your real tools Connectors (MCP) + plugins MCP servers + plugins
5 Sub‑agents separate maker from checker TOML in .codex/agents/ .claude/agents/, agent teams
+ Memory durable state between runs markdown / Linear via connector markdown (AGENTS.md, progress files) / Linear

Comments

No comments yet. Start the discussion.