DEV Community

From Prompting ChatGPT to Orchestrating AI Agents: Two Years as an Ordinary Engineer

Stage 1: AI Was Impressive, but Unreliable

My first experience with ChatGPT was similar to that of many people. The initial response was amazement. The second response was scepticism. It did not take long to discover that a language model could present incorrect information with extraordinary confidence. I learned the term "hallucination," and asking about the model's knowledge cutoff became part of my normal usage.

I later completed introductory AI courses from Andrew Ng and Microsoft. I did not understand every technical detail, but those courses gave me enough context to recognise an important limitation: A model producing a plausible answer is not the same as a system producing a reliable result. That distinction became more important as I moved from asking questions to using AI to modify software.

Stage 2: Cursor Turned AI Into a Development Partner

For me, the real shift began with Cursor. Before Cursor, AI was primarily a place I went to ask for help. With Cursor, AI entered the development environment itself. One Christmas, I spent three days almost entirely at home, building with Cursor. At the end of those three days, I had created my first iOS application. Then I built another one. And another.

Cursor made the feedback loop unusually short: Describe an idea. Generate an implementation. Run it. Observe the failure. Ask for a correction. Repeat. This was my introduction to what later became widely described as Vibe Coding.

The most important change was not that the model wrote code faster than I did. It was that ideas remained alive long enough to become prototypes. Before AI-assisted development, a small idea often died during setup, documentation searches, dependency configuration, or framework research. With AI, the cost of reaching the first working version dropped dramatically.

Rules Were an Early Form of Context Engineering

The early experience was also frustrating. The agent forgot previous decisions. It changed files outside the intended scope. It solved local problems by creating architectural ones. It occasionally behaved as if it had never seen the project before.

The community's response was to rediscover software engineering discipline. We began writing project rules:

  • Read the instructions before changing code.
  • Explain the implementation plan first.
  • Do not delete files without confirmation.
  • Follow the existing architecture.
  • Verify the result after making changes.
  • Ask questions when requirements are ambiguous.

At the time, this was often described as prompt engineering. Looking back, it was an early form of context engineering. We were not merely trying to write better sentences. We were trying to create a stable operating environment for a probabilistic system.

Limited context made this harder. Long conversations regularly lost important information, and opening a new chat was part of the workflow. I even required the agent to refer to itself using a fixed name in every response. It was an imperfect but useful signal that the original instructions were still present.

The implementation details have changed, but the underlying problem remains: An agent cannot reliably continue work unless the important context is explicit, accessible, and structured.

AI Exposed My Missing Domain Knowledge

While building my first applications, I noticed that my interfaces were consistently unattractive. Initially, I treated this as a model limitation. Later, I realised that the missing component was my own domain knowledge. AI could generate SwiftUI code, but it could not give me design judgement that I did not possess. It could implement a button, but I still needed to understand:

  • Why the button should be placed in a particular location.
  • How much spacing creates visual hierarchy.
  • What information a user needs before taking an action.
  • Why an interface feels confusing even when every feature works.

AI reduced implementation friction, but it did not remove the need for expertise. In some ways, it made missing expertise more visible. When implementation is slow, it is easy to blame the tools. When implementation becomes fast, weak product decisions, unclear requirements, and poor design judgement become much harder to hide.

Stage 3: Moving Beyond the Chat Interface

As models and tools multiplied, I started using APIs and local inference rather than relying entirely on hosted chat products. I bought a reasonably powerful computer and began downloading models from Hugging Face. One model that surprised me was Kokoro TTS. Its relatively small footprint showed me how much useful capability could exist outside the largest hosted models.

I later integrated text-to-speech into an iOS application using the Sherpa ecosystem. This introduced a new category of engineering problems. Running a model in a desktop experiment is very different from shipping it inside a mobile application. Suddenly, I had to care about:

  • Model size.
  • Memory usage.
  • Startup latency.
  • Packaging.
  • Device compatibility.
  • Offline execution.
  • Failure handling.

This experience reinforced another lesson: A model demo is not a product. The engineering work begins when the model has to operate inside real constraints.

Stage 4: Spec-Driven and Agent-Driven Development

After spending a long time in free-form Vibe Coding loops, Kiro's Spec mode stood out to me. It approached AI development through explicit requirements, design, tasks, and implementation steps. This felt important because code generation was no longer the main bottleneck. The real challenges had become:

  • Maintaining the goal across a long-running task.
  • Preserving constraints.
  • Producing work that could be reviewed.
  • Recording decisions.
  • Enabling another developer or agent to continue.
  • Recovering when the agent moved in the wrong direction.

Spec-driven development felt less magical than free-form prompting, but it felt more transferable to professional engineering. The same transition occurred with project instruction files, skills, goal modes, and agent loops. The industry moved from: "Generate this function." toward: "Understand this repository, follow these constraints, complete this goal, test the result, and report what changed." That is a fundamentally different type of interaction.

MCP Connected AI to Existing Systems

The Model Context Protocol generated plenty of debate when it appeared. Some considered it a major architectural shift. Others saw it as an additional abstraction that might eventually be replaced. In my own work, MCP became useful for a practical reason: It offered a relatively lightweight way to connect agents to existing tools and data.

I built an MCP integration for my company's knowledge platform. The organisation already had nearly ten years of accumulated knowledge. The problem was not the absence of information. The problem was making that information available within an agent workflow. Once the knowledge platform was connected, the agent could retrieve information in the context of an actual task.

This represented a major change in how I thought about AI. Two years earlier, I had asked a chatbot when its training knowledge ended. Now I was designing a system that gave an agent access to the organisation's current knowledge. The focus had shifted from model intelligence to system design.

Coding Is Easier, but Engineering Is Not

Modern coding agents can produce remarkable amounts of working code. It is increasingly possible to define a goal, give an agent access to a repository, and let it iterate through implementation and testing. That can make programming appear almost trivial.

My experience has been more complicated. The amount of manual typing has decreased. The need for engineering judgement has not. The work has moved toward:

  • Defining goals precisely.
  • Breaking large tasks into safe boundaries.
  • Supplying relevant context.
  • Preventing unnecessary changes.
  • Reviewing architecture.
  • Validating assumptions.
  • Testing behaviour rather than trusting output.
  • Diagnosing failures that cross multiple generated changes.

A successful build is not necessarily a correct product. A passing test suite is not proof that the requirement was understood. Generated code still becomes code that someone must own. AI has made implementation faster, but faster implementation increases the importance of deciding what should be implemented.

Multi-Agent Development Created a Memory Problem

I now regularly use more than one agent on a project. Claude may begin a task. Kiro or Codex may continue it. Another tool may take over when the first one reaches a limit or struggles with a particular type of work. This sounds like parallel engineering, but the handoff is often the weakest point.

The next agent needs to know:

  • What the original goal was.
  • What has already been implemented.
  • Which decisions were intentional.
  • Which approaches failed.
  • Which files are sensitive.
  • What remains incomplete.
  • What should not be changed.

Without that information, every handoff becomes a partial restart. To solve this problem for my own workflow, I built QiJu, written 起居 in Chinese. QiJu records project changes and provides a reusable memory layer for subsequent agent sessions. Without it, I spend significant time reconstructing context. With it, agent handoffs become closer to continuing a shared workflow rather than starting another isolated chat.

I no longer think of multi-agent development as "using several AI tools." The engineering challenge is creating continuity between them.

The Most Important Change: Ideas Survive Longer

A few days ago, I spent around thirty minutes building a floating stopwatch for macOS. It was a small tool I had wanted for a long time. The result was not technically significant, but the process represented something larger. Previously, that idea would probably have remained on a list. Building it would have required enough setup and research that the expected benefit did not justify the activation cost.

With AI assistance, the distance between the idea and a usable first version was short enough that the idea survived. That is probably the biggest personal impact AI has had on me. It has not made every idea valuable. It has made experimentation cheap enough that more ideas get the opportunity to prove whether they are valuable.

AI Expanded My Capabilities and Narrowed My Attention

There has also been a personal cost. Because AI and software occupied so much of my attention, I realised that I was entering an information bubble. My daily reading became dominated by models, agents, context windows, IDEs, MCP servers, and product announcements.

To counter that, I began using AI to generate a daily story about something outside my normal interests. History, art, society, individuals, and unfamiliar parts of the world became part of a collection I call "Stories Outside the Bubble." The irony is obvious. AI expanded what I could build while narrowing what I noticed. Then I used AI to widen my attention again.

What I Believe After Two Years

Over these two years, my relationship with AI moved through several stages:

  • AI as a chatbot.
  • AI as a coding assistant.
  • AI as an implementation agent.
  • AI as a component connected to tools and knowledge.
  • AI as part of a multi-agent engineering workflow.

The most valuable lesson is not that prompts are powerful or that coding agents are fast. It is that useful AI systems depend on everything around the model: Context. Domain knowledge. Memory. Tools. Constraints. Evaluation. Human judgement. Continuity between tasks.

I remain an ordinary engineer. But AI is now embedded in how I build applications, use company knowledge, record project decisions, and explore new subjects. I do not know which models or development tools will still be dominant 24 months from now. I do expect the distance between an idea and an implementation to keep shrinking. The difficult part will remain deciding which ideas deserve to be implemented, designing systems that can be trusted, and taking responsibility for what the agents produce.

After writing this retrospective by the ocean, I should probably continue taking the morning off. But I am already thinking about opening Claude Code again.

Comments

No comments yet. Start the discussion.