Reading Anthropic's "When AI Builds Itself" Changed How I Think About AI and Software Engineering
DEV Community

Reading Anthropic's "When AI Builds Itself" Changed How I Think About AI and Software Engineering

I Want to Start With Something Honest

Over the last few months, I have probably read hundreds of posts about AI replacing developers. Some were thoughtful. Some were obviously written just to get clicks. But after a while, they all started blending together, and I noticed something interesting. The loudest opinions almost never came from people discussing the original source material. They came from summaries of summaries, screenshots of tweets, or headlines that focused on a single statistic while leaving out everything around it.

So when Anthropic published When AI Builds Itself, I decided to read the whole thing instead of waiting for someone else to explain it. I expected to come away more worried. Instead, I came away thinking the conversation online had become much more dramatic than the essay itself.

The numbers are real. The pace is real. The changes happening inside companies like Anthropic are real. None of that should be ignored. That doesn't mean the essay is reassuring on its own. Some of the numbers are genuinely astonishing. But the overall picture is much more nuanced than the internet often makes it sound.

This isn't meant to summarize every page. It's simply how reading the essay changed the way I think about the conversation around AI and software engineering.

One piece of context is worth mentioning before getting into it. Anthropic didn't publish this essay as a prediction about what software engineering might look like someday. They wrote it to explain what they were already seeing inside their own engineering and research teams as AI became a much larger part of their development process. That distinction matters. The essay is mostly describing changes they have already observed, not changes they simply hope will happen.

How It Started and How Fast It Moved

One thing I wasn't expecting was how little time the essay spends making predictions about the future. Instead, it starts by looking backward. The authors walk through how AI gradually became part of Anthropic's own engineering workflow, and that context matters because it changes the way you read everything that comes after it. The essay isn't saying "this is what might happen one day." It's saying "this is how we got here."

In the early years, around 2021 to 2023, things looked much like they would at any other software company. Engineers wrote code, reviewed pull requests, fixed bugs, and made technical decisions. AI wasn't really part of the development process yet.

Then it started helping with smaller tasks. At first, it looked a lot like how many of us use AI today. Generate a function. Explain a piece of code. Suggest a refactor. The engineer was still driving every step, while AI acted more like another tool sitting beside the editor.

Around 2025, that relationship began to change. Instead of only suggesting code, Claude started handling much larger parts of the workflow. It could write files, run them, inspect the output, fix errors, and repeat that cycle several times before a person needed to step in again. The role of the engineer wasn't disappearing, but the amount of hands-on implementation they needed to do was already changing.

By 2026, according to the essay, those workflows had become even more autonomous. AI agents were capable of working for much longer periods of time and, in some cases, coordinating work with other agents.

One example from the essay makes that progression much easier to picture. A routine software upgrade unexpectedly caused tens of thousands of AI training jobs to fail. An engineer gave Claude access to the environment along with some context about the problem. Within roughly two hours, Claude identified an obscure configuration flag that was responsible for the failures, verified the fix, and resolved the issue. According to the authors, the same investigation would likely have taken an experienced engineer two or three days.

Stories like that are impressive on their own, but they're still just one example. What convinced me was that the essay backs them up with data. The numbers suggest this wasn't a one-off success but part of a much broader shift inside the company.

The Numbers, Because They Matter

Before talking about what all of this means, it's worth looking at the numbers themselves. They're easy to exaggerate. They're also easy to dismiss. Neither reaction is particularly helpful.

The headline statistic is the one that has probably already made its way around social media. As of May 2026, Anthropic says that more than 80% of the code merged into its production codebase was authored by Claude. Before Claude Code launched in early 2025, that figure was only in the low single digits.

The effect shows up in productivity too. Engineers are now merging roughly eight times more code than they were in 2024. According to the essay, that happened in two noticeable jumps:

  • The first came when Claude moved beyond simply suggesting code and started running it.
  • The second happened when AI agents became capable of working autonomously over much longer periods.

The research side tells a similar story. Anthropic shared results from an internal survey of around 130 researchers. The median response was that people felt they were producing roughly four times as much output when using AI compared to working without it.

The capability benchmarks have also moved quickly. One benchmark measures whether an AI system can successfully reproduce the results of published research papers. Success rates reportedly increased from around 20% in 2024 to nearly saturating the benchmark only fifteen months later.

Another measure estimates how long AI can reliably complete real-world tasks on its own, and according to the essay, that window has been doubling roughly every four months, growing from tasks that took only a few minutes to tasks lasting around twelve hours.

Those numbers are impressive. What gave me more confidence in them was how openly the authors discussed their limitations. They repeatedly point out the gaps in their own measurements. Lines of code are an imperfect productivity metric. Survey responses can overestimate real productivity gains. Benchmarks don't always capture what happens in real engineering work. That actually made the data more convincing. It felt less like marketing and more like a team trying to explain what they're genuinely seeing inside their own organization.

The Difference Between Execution and Judgment

The most important part of the essay comes after all the numbers. After reading through them, I found myself asking a much simpler question. If Claude is writing most of the code, what are the engineers doing?

The answer, at least from how I read the essay, is that the work developers do isn't disappearing. It's changing.

Claude has become exceptionally good at execution. Give it a clearly defined task, enough context, and the right tools, and it can move through implementation remarkably quickly. It can write code, run experiments, debug issues, test different approaches, and iterate far faster than a person could on repetitive engineering work.

But software engineering has never been only about writing code. Someone still has to decide which problems are worth solving. Someone has to recognize when an experiment is answering the wrong question, even if it technically succeeds. Someone has to look at a result that seems correct and ask whether it actually makes sense within the larger system. Those decisions are much harder to measure than lines of code or benchmark scores, but the essay suggests they remain an important part of where engineers create value.

The authors even tried to measure part of this. They looked at real research sessions where a human made a decision that later turned out to be inefficient or simply wrong. They then showed Claude everything up to that point and asked what it would do next. Their best model improved from choosing the better next step about 51% of the time in late 2025 to around 64% only a few months later. That is meaningful progress. At the same time, it also means the model was still not choosing the better direction in every situation. On more open-ended decisions, there is still a noticeable gap.

One comparison in the essay helped put that into perspective. The authors describe how responsibilities change as engineers gain experience:

  • Early in a career, much of the work involves implementing tasks that someone else has already defined.
  • With experience comes more responsibility for deciding how those tasks should be approached.
  • Eventually, the focus shifts to which problems deserve attention in the first place.

I don't think that comparison means AI is simply replacing junior engineers while senior engineers stay untouched. Software engineering doesn't work that neatly, and neither does AI. What it suggests is that as implementation becomes easier, the skills around understanding systems, evaluating trade-offs, reviewing work, and making good decisions become even more valuable.

That ended up being my biggest takeaway from the essay. I don't think the discussion is really about whether developers become unnecessary. It's about how the balance of the job changes as one part of software engineering becomes dramatically faster. That's a much more useful way to think about what's happening than reducing the conversation to "AI writes most of the code."

What I Think About This as Someone Early in My Career

I know a lot of people around me who are genuinely worried about AI. Sometimes that worry comes from social media, sometimes from conference talks, and sometimes simply from seeing how quickly these tools are improving. When you read that more than 80% of the production code inside one of the world's leading AI companies is now written by AI, it is difficult not to wonder where that leaves everyone else.

I have had those thoughts too. Reading the essay did not make those questions disappear, but it did change the way I think about them.

The biggest difference for me was that I stopped focusing on the number itself. 80 percent sounds enormous until you start asking what that eighty percent actually represents. The essay made me realize I had been measuring software engineering mostly by the amount of code being written, when in reality some of the most valuable work happens long before anyone opens an editor. That shift in perspective made the essay feel much less like a story about replacement and much more like a story about changing workflows.

The more I thought about that, the more it reminded me why we spend so much time learning computer science fundamentals. When you are studying operating systems, networking, databases, algorithms, or distributed systems, it is easy to wonder when you will ever use some of those ideas. They can feel abstract compared to building an application or shipping a feature. But those subjects are not only teaching syntax or APIs. They teach you how to reason about systems. They teach you how to think about trade-offs, understand complexity, identify bottlenecks, and explain why something behaves the way it does. Those skills become more valuable as implementation becomes easier, because they are the skills that help you evaluate whether the implementation is actually correct.

That was the point where my perspective really changed. The fear that developers are being replaced often comes from imagining that writing code is the entire job. Software engineering has never really worked that way. Writing code is important, but so is understanding the problem, designing the system, reviewing solutions, communicating with other engineers, and making decisions when there is no obvious answer.

I am still early in my career, and I know people with much more experience will have different perspectives on this. That is perfectly reasonable. This is simply the conclusion I reached after reading the essay carefully instead of reacting to the headlines surrounding it.

Three Ways This Could Go

The essay avoids something I see a lot in AI discussions. The internet often talks about AI as though there are only two possibilities: either everything changes overnight, or nothing really changes at all. The essay takes a much more measured approach. It lays out several possible directions and is honest that nobody knows with certainty which one we are heading toward.

The trend slows down

The first possibility is that today's rapid progress eventually begins to slow. Every technology reaches limits somewhere, and AI capabilities may face similar constraints. The essay acknowledges this as a plausible outcome without dismissing it as unlikely.

Comments

No comments yet. Start the discussion.