DEV Community 1h ago

A Prompt Is a Wish. A Tool Is a Law.

The Shape of the Thing

The platform is a place where anyone - engineers, PMs, designers, QA - can publish a reusable AI tool, and everyone else can use it. Write once, available to all.

A few terms up front, because the whole design leans on them:

MCP (Model Context Protocol) is a standard way for an AI client to discover and call your functions. The key detail: there's a step where the client asks the server "what tools do you have?" and the server answers with a list. Hold onto that - half the design hangs off that one list.
Cloudflare Workers is code that runs on Cloudflare's servers at the network edge instead of your own.
Durable Objects is per-session server-side storage that lives outside the model's context - the finite, token-costing window of everything the model can currently see.

None of this is exotic; what matters is where each piece of state lives.

Under the hood it's three small Workers speaking MCP: a gateway (auth, routing, secrets), a skill-runner, and an agent-runner. Secrets are fetched by the gateway from a secrets manager - never inlined, never handed to the code that runs user logic unless that code is explicitly an action (more on that distinction below).

Here's the part most "AI platform" posts skip: how it's consumed. You don't install fifty separate agents into your Claude client. You connect one MCP server. Every published tool shows up through that single endpoint. That choice is the difference between a platform and a context-bloat machine, and I'll come back to why.

The tools themselves reach the systems a company runs on - issue trackers, chat, docs, the CMS, the analytics warehouse, the payments database. Some of that data is harmless. Some of it is a compliance incident waiting for one careless fetch. The whole design is organized around that asymmetry.

Problem 1: A Prompt Is a Wish. A Tool Is a Law.

The authoring flow is a fixed pipeline: plan it, get the plan approved, generate the files, review your own work, open a PR. A nice orderly flow. The agent refused to respect it. It generated files before the plan was approved. It "reviewed" code by saying "looks good" and immediately opened a PR. It skipped the inconvenient steps and barreled toward the finish, because that's what a model optimizing for "be helpful, complete the task" does. My pipeline existed in my head and in a long instruction file the model treated as a polite suggestion.

I tried the obvious things first, in order of increasing desperation:

Instructions. A system prompt with bold "STOP. Do not write code until the plan is approved." The model reads it, agrees, and writes code anyway when the task seems to call for it. Prompt text is an input the model weighs, not a rule it obeys.
An in-memory state machine. Track the phase in the conversation and refuse to advance. This dies the moment the context is compacted - agents summarize old history to save space, so a fact the model "knew" twenty messages ago silently vanishes, and it forgets what phase it's in.
Hooks. Intercept actions and block the disallowed ones. The model is remarkably good at rerouting around a blocked path, rephrasing, or finding another tool that gets it to the same place.

The pattern across all three: each lives inside the model's reasoning, and anything inside the model's reasoning is negotiable. A model under task pressure rationalizes its way past text reliably enough that you can't depend on it. Prompts still steer the model - they just can't guarantee it, and a production rule needs a guarantee.

So the trick isn't to tell the model the rules better. It's to make the rules a property of the tools. Each step becomes its own tool, and the tools form a graph: a step tool validates that the previous step happened, and only on success does it return the instructions for the next step. The model can't skip ahead, because it physically doesn't have the next instructions until the current gate hands them over - and the gate is the only edge into the next state.

start_building → confirm_plan → submit_for_review → submit_final → create_pull_request

This is the part people get wrong, including me at first: the thing that makes a gate a wall is not that a failed tool call is hard to ignore. The model can ignore an error - it can retry, or route around it, the same way it routed around hooks. What it cannot do is fabricate the next step's instructions, because those only exist inside a validated success response. The determinism is in the server-side state gate - every tool checks the persisted phase before it acts - not in the error. The error is just how the gate says "not yet."

Concretely: the agent calls create_pull_request while the phase is still planning. The gate sees the wrong phase, returns an error, and - the part that matters - never hands back the next step's instructions. The agent isn't forbidden from finishing; it's unable to, because finishing requires words it was never given. State lives server-side, keyed by session, in Durable Object storage - persisted outside the model's context entirely, so the compaction that killed the in-memory version can't touch it.

const fail = (text: string) => ({ isError: true, content: [{ type: "text", text }] });
const ok = (text: string) => ({ content: [{ type: "text", text }] });

export const confirmPlan: ToolDef = {
  name: "confirm_plan",
  description: "Submit your implementation plan. Required before writing any code.",
  inputSchema: planSchema,
  run: async ({ plan }, ctx) => {
    const state = await ctx.storage.get<BuildState>("buildState");
    // fail closed: no session, no progress
    if (!state) return fail("No active session. Call start_building first.");
    if (state.phase !== "planning") {
      return fail(`confirm_plan is only valid during planning. Current phase: ${state.phase}.`);
    }
    // gate on the prior steps, not on the plan's prose: discovery must precede planning
    const missing = unfinishedSteps(state); // checked existing skills + agents? ran discovery?
    if (missing.length) {
      return fail(`Not ready to plan yet - finish first:\n- ${missing.join("\n- ")}`);
    }
    await ctx.storage.put("buildState", { ...state, phase: "building", plan });
    // success == the ONLY source of the next step's instructions
    return ok("Plan accepted. Generate the files now, then call submit_for_review.\n" + BUILD_RULES);
  },
};

The principle in one line: the model doesn't get permission for the next step until a tool confirms the last one. Not a prompt - a program.

submit_final is where "trust but verify" becomes just "verify." It takes the final files and the findings from the model's own code review, and refuses an empty review:

if (!reviewFindings || reviewFindings.length === 0) {
  return fail(
    "review_findings is empty. Re-review the diff and report concrete findings " +
    "(even if you then resolve them). An empty review is not a passing review.",
  );
}

Be honest about what this check buys: it raises the floor, it doesn't guarantee a real review. A model can satisfy length > 0 with one throwaway finding just as it satisfied "looks good." But making zero findings an error turns "looks fine" from an exit into a prompt to look again - and in practice that nudge is worth a lot. It's a floor, not a ceiling.

Problem 2: "Write Some Code" Is Too Much Power. Split It Into Three.

If a non-engineer can author a tool, and a tool is "arbitrary code," then a non-engineer can author arbitrary code against production. That's not a platform. That's an incident generator with a chat interface.

So a "tool" isn't one thing. It's exactly one of three primitives, and the difference between them is the entire safety model:

A skill is pure logic. No fetch. No secrets. No side effects. "Group these payments by error code" is a skill.
An action is the only thing allowed to touch the outside world. Every fetch, every API key, every secret lives here and nowhere else. "Read yesterday's failed payments from the database" is an action.
An agent orchestrates skills and actions into a workflow. It composes; it doesn't reach out.

// skill - pure. Rejected at review if it contains a fetch().
export const groupByErrorCode = defineSkill({
  name: "group_payments_by_error_code",
  run: (payments: Payment[]) =>
    payments.reduce((acc, p) => {
      (acc[p.errorCode] ??= []).push(p);
      return acc;
    }, {} as Record<string, Payment[]>),
});

// action - owns the I/O and the secret. Nothing else does.
export const fetchFailedPayments = defineAction({
  name: "fetch_failed_payments",
  apiKeySecret: "PAYMENTS_DB_TOKEN", // the token comes from the secrets manager at runtime
  run: async ({ since }, ctx) => {
    const res = await fetch(`${ctx.env.PAYMENTS_URL}/failed?since=${since}`, {
      headers: { authorization: `Bearer ${ctx.secrets.PAYMENTS_DB_TOKEN}` },
    });
    return res.json();
  },
});

This is not ceremony. It means the question "can this tool leak payment data?" has a mechanical answer: only if it uses an action that can reach payment data. Skills can't. Agents can't. You audit the actions, and you've audited the blast radius.

None of this is a new idea - it's capability-based security wearing work clothes. A skill has no ambient authority: it can't reach the network because the network was never handed to it. The contribution isn't the principle, it's the threat model it's pointed at: the code's author is a language model optimizing for helpfulness, and the spec is a sentence from someone who can't read the output.

Two honest notes a careful reader will demand:

"Rejected if it contains a fetch" is doing a lot of work - how? Less than the word "analysis" implies, and it's worth being exact. The submit-time check is a regex - /\bfetch\s*\(/ run over the file text - not an AST parse. It catches the honest mistake; it would not stop a determined author (globalThis["fet" + "ch"], a dynamic import(), any indirect reference sails straight past). So treat the static check as a smell test, not a wall.

The real boundary is two structural facts the author can't edit around. First, a skill runs with an empty environment: the runner holds the secrets in memory but hands the skill {}, so a stray fetch has no credentials to authenticate to anything that matters - it could hit a public URL and learn nothing. Second, every secret-holding, network-touching primitive - every action - runs in a separate Worker from the skills, and that's the only Worker the secrets manager is wired into. A skill isn't sandboxed away from fetch; it's quarantined away from credentials. That's the part a fetch() smuggled past the regex still can't beat.

For a junior author the win is the same boundary, flipped. You never hold the database token, so you can't paste it in the wrong place - it never enters your file; it's injected from the secrets manager at runtime, into the action's Worker, after you've shipped. The boundary that protects the company is the boundary that protects you from yourself.

One more thing the action boundary buys: you're not married to one model vendor. An action that needs an LLM can call OpenAI, Gemini, or Claude; the provider is a per-action choice and every key comes from the same secrets manager. The model list lives in config, not code - adding a model is an edit, not a deploy. The platform doesn't care which model your tool talks to, because talking to a model is just another action.

Problem 3: Not Everyone Should See Every Tool - and That's Also Why the Context Stays Clean

A tool that summarizes open issues is fine for everyone. A tool that reads the payments database is not. The dangerous part of an AI tool is rarely what it writes - it's what it can see.

So which tools show up for you is gated by the sensitivity of the data they can reach, not by who authored them. Every primitive carries an optional allowedGroups. Empty means public. Otherwise the platform takes the user's groups from the identity provider (the corporate single-sign-on that already knows which teams you're on) - the same groups that govern who can open which dashboard - and intersects them with the tool's allowedGroups, at the moment it answers "what tools do you have?":

function registerTools(server: McpServer, tools: ToolDef[], user: UserProps) {
  for (const tool of tools) {
    if (!hasAccess(tool.allowedGroups, user.groups)) continue; // not listed for this user
    server.tool(tool.name, tool.inputSchema, tool.run); // thin wrapper over the MCP SDK call
  }
}

const hasAccess = (allowed: string[] | undefined, userGroups: string[]) =>
  !allowed?.length || allowed.some((g) => userGroups.includes(g));

Now the second payoff, the one that surprised me. The same group check that decides who sees what also does context hygiene. A few months in...

Read on DEV Community ↗ ← Back to News

A Prompt Is a Wish. A Tool Is a Law.

The Shape of the Thing

Problem 1: A Prompt Is a Wish. A Tool Is a Law.

Problem 2: "Write Some Code" Is Too Much Power. Split It Into Three.

Problem 3: Not Everyone Should See Every Tool - and That's Also Why the Context Stays Clean

Comments