I Made Claude Code Think Before It Codes. Then I Gave It a Team.
The Shift: From a Senior Developer to a Senior Architect Who Runs a Team
Here's the thing about a disciplined senior developer working alone: they're still working alone. They read carefully, test thoroughly, self-review, sequentially, one task at a time, blocking on every code-review round trip. v1 was a fantastic individual contributor, and an individual contributor has a ceiling: one pair of hands.
A real senior architect doesn't sit in the editor all day. They decompose a problem into separable concerns, write the contract everyone builds against, and hand the backend, the UI, and the test coverage to people who work at the same time. They plan, dispatch, integrate, and keep the review pipeline moving, and almost never write the code themselves.
That's v2: the mental model went from "make Claude a senior developer" to "make Claude a senior architect who runs a team," and, for me, from running the team to conducting it.
Concretely: a single main-thread orchestrator that never writes code itself, fanning work out to specialist subagents, each in its own isolated git worktree, building in parallel, driving many pull requests at once through an automated review gate.
The emotional payoff that genuinely surprised me the first time I watched it: I'd describe a feature, walk away, and come back to find an engineering team had been shipping while I slept.
From v1 to v2: What Changed
If you used the original /wizard, here's the whole upgrade in one table:
| v1 (one disciplined developer) | v2 (an architect running a team) | |
|---|---|---|
| From idea to work | You hand-write the ticket | An issue-maintainer agent turns a one-line idea into a structured issue or epic |
| Who writes the code | One Claude does everything | An orchestrator that writes no code; specialist subagents implement |
| Concurrency | One task at a time | A cohort of up to ten pull requests open and moving at once |
| Build shape | Read, test, implement, review, sequentially | Architect designs the contract; backend and frontend build in parallel off it, using TDD; a QA specialist verifies |
| Review gate | Monitor your code-review bot, fix findings, repeat | An independent reviewer that didn't build it; findings routed back to the specialist whose layer they live in, across all PRs at once |
| Your role | Architect who stopped writing code and reviewing it line by line | Conductor who also stopped writing the prompts and the issues, and now tunes the workflow |
Everything that made v1 work is still here, underneath. v2 doesn't replace the discipline. It distributes it across a team and runs that team in parallel. In fact, v1 is preserved verbatim as "direct mode," and the team only spins up when the work is complex enough to be worth it. A one-line fix never pays the team tax.
Let me walk through the pieces.
It Starts Before the Code: An Agent That Turns Ideas Into Issues
The very front of the pipeline is the part I underestimated longest. I used to write the tickets. A feature would occur to me in the shower, and the price of acting on it was turning a vague sentence into a well-formed issue: title, acceptance criteria, labels, a link to the parent epic. That quietly throttles everything downstream. A sloppy ticket produces sloppy output no matter how good the team is.
So that became an agent too. An issue-maintainer takes a one-line idea ("let an admin turn on a guided walkthrough for new users") and produces a structured issue: a clear title, explicit acceptance criteria, consistent labels, and the parent-to-sub-issue links that tie an epic to its pieces.
I stopped formatting tickets the same way I stopped formatting code. The point isn't that it saves me ten minutes of typing. The point is consistency. When every issue has the same shape, same label vocabulary, acceptance criteria written the same way, the same epic-to-subtask structure, the rest of the machine runs on a clean, uniform source of truth: the orchestrator picks up any issue and immediately knows what "done" means, and the builders inherit acceptance criteria they can write a failing test against.
Consistent issues are the rail the whole train rides on. Idea to issue is the first agent step, not something I do by hand before the agents start.
The Orchestrator / Worker Split (and Why the Boundary Is git commit)
The single most important design decision in v2 is the line between the orchestrator and the workers, and where exactly it sits.
The orchestrator is the main thread: the Claude you actually talk to. It plans, dispatches subagents, monitors the pipeline, and integrates results. It does not open an editor on application code. The moment it does, it stops orchestrating, burning the context that ten parallel pull requests depend on and serializing work three specialists could have done at once.
The workers are subagents: each gets a focused brief, does the implementation, runs the affected tests, and commits locally, then returns one result message: branch name, final commit SHA, what it touched.
The handoff boundary is exactly git commit. The subagent commits; the orchestrator does everything from git push onward: push, open the PR, run the review cycle. A commit is the two-phase-commit point between local work (fully reversible) and external commitments (CI fires, reviewers get notified, check-runs get recorded against that SHA).
Splitting responsibility there buys three concrete things:
- You verify the diff before you expose it: a worktree cut at dispatch can go stale if siblings merge, and a quick
fetch && rebasecatches the phantom-deletion diff before it confuses anyone. - You get clean failure recovery: a subagent that crashes mid-task has pushed nothing, so the orchestrator just salvages the working tree instead of cleaning up a half-built PR.
- You get a single monitoring owner: exactly one entity knows the state of every in-flight PR, so it declares a PR ready exactly once and composes the title and description from cross-cutting context a subagent never has.
The Ensemble: An Architect, Then Builders in Parallel, Then Critics
Here's where it stops looking like one assistant and starts looking like a team with a roster. When the orchestrator gets a non-trivial piece of work, it doesn't dispatch "a builder." By this point the issue-maintainer has already turned the raw idea into a structured GitHub issue with acceptance criteria, so the ensemble has something concrete to build against.
From there it runs in a deliberate order:
The architect goes first, and writes no production code. Its job is to design the subsystem, enumerate the invariants, run the concurrency analysis (what happens if this runs twice at once? what must stay true across every path that touches this data?), and produce two artifacts: a failing-test spec encoding the acceptance criteria (the ones the issue-maintainer wrote into the issue, now made executable), and a data contract - every field the UI and backend will exchange, with its type, range, and default. It's read-only; it designs, it does not build. That contract is the seam that keeps the team honest: every builder's output is checked against a concrete failing test the architect specified, not the builder's own loose reading of the brief.
Then the builders go, in parallel, off that one contract. A backend specialist takes the services, models, and migrations; a frontend specialist takes the UI; a QA specialist authors the coverage. They run simultaneously. The frontend doesn't wait for the backend, because it already knows the exact shape of the data it'll receive. Each owns a non-overlapping set of files, so they never collide in the same tree. A genuinely single-domain change collapses to one builder, but splitting is the default, not the exception.
Then the critics verify, and crucially, they didn't build it. This is generator/evaluator separation, and it matters: the agents that wrote the change are not the agents that sign off. The QA specialist comes back after the code is green, applies a mutation-testing mindset (don't assert "it worked," assert the specific value and exact count that would break if the code mutated), and confirms the acceptance criteria are actually covered.
And then there are the domain-user lenses, my favorite part. For each kind of user your product has, there's one adversarial critic whose job is to read the change through that persona's eyes and find where it breaks for them. Admin, end user, power user become an admin lens, an end-user lens, a power-user lens. Each runs two probes:
- Feature parity: "A capability was added for the admin, should the power user get an analogue?"
- Cross-actor leak: "Will this admin-only feature surface on a screen the end user shares?"
A lens that says "not applicable" has to have run both probes and reasoned them empty. It's a conclusion you earn, never a step you skip.
Finally a documentation librarian does the job every engineer swears they'll do and never does: it reads the merged change and checks that the docs, the changelog, and the API references still tell the truth about what the code now does. Not "were docs added," but "do the docs still match reality." Stale documentation is a bug that ships silently, the one no test suite will ever catch, and it rots the codebase one half-true README at a time. The librarian is the team's memory, and it refuses to let the docs drift from the code.
One hard rule ties the ensemble together: the agents never talk to each other. They run in isolated contexts and return exactly one result; the architect can't hand its spec to a builder, a builder can't hand its diff to QA. Every hand-off is orchestrator-mediated: it reads agent A's output, distills the part agent B needs, and bakes it into B's brief. This isn't a swarm of peers negotiating; it's a manager decomposing work, dispatching isolated specialists, and stitching their one-shot results into the next link of the chain.
The Moment the Team Caught What One Developer Would Have Missed
The parallelism is nice, but the quality is the real win. Take a neutral example: you ask the team to add an admin capability that enables a new onboarding walkthrough for users. A single competent developer, even a disciplined one, builds exactly that, ships it, and it works. The acceptance criteria are met.
But the end-user lens, the critic whose only job is to think like a regular user, ran its cross-actor leak probe and asked what nobody had: what is this new behavior actually bound to? It was bound to a shared, low-level UI component - the kind of context-free primitive a checkbox or a toggle is - that gets reused across many screens, including ones a regular user sees.
And that's the trap. You can secure a privileged surface by splitting it per role. You cannot secure a primitive: a checkbox is just inputs and outputs, it has no idea whether it's sitting on an admin console or an end-user settings page. The thing that's supposed to decide "may this actor trigger this capability" lives a tier above it, in the controller and service layer, not in the component. The admin behavior had been wired straight onto the shared primitive without that middle-tier gate guarding the specific binding, so the same primitive rendered on an end-user screen quietly inherited the wiring. A regular user, with nothing to do with this admin feature, would have been able to trip it.
No test for the admin feature would catch this, because the admin feature works perfectly. The bug only exists for a different actor than the one you were building for, and it got caught only because a critic's mandate was to ask "what does this do to my user's world?" The solo dev verifies what they built; the team verifies that and its blast radius across every other actor, right down to which tier is actually holding the authorization line.
The Parallel Pipeline: A Cohort of Up to Ten, All Driven to Merge-Ready
Here's the habit that took me longest to unlearn: thinking of work as one task at a time. The orchestrator doesn't. It doesn't pull one issue, finish it, and pull the next. It grabs a cohort of up to ten issues at once and drives them all toward merge-ready concurrently, each in its own worktree with its own subagents and its own pull request.
Comments
No comments yet. Start the discussion.