DEV Community 3h ago

Build a Minimal WebMCP Agent with Playwright and Gemini

Prerequisites

For this example, you need:

Node.js 20+
Google Chrome
A Gemini API key

Prepare the Solution

The first thing we need to do is enable WebMCP in Chrome. WebMCP is still experimental, so for local development it must be enabled through a Chrome flag:

Open Chrome and navigate to chrome://flags/#enable-webmcp-testing
Set the flag to Enabled.
Relaunch Chrome to apply the changes.

After that, we can create a small Node.js project:

mkdir custom-agent
cd custom-agent
npm init -y

Next, install Playwright as a development dependency. I also use tsx to run TypeScript files directly and dotenv to read environment variables from a .env file:

npm install -D playwright tsx dotenv typescript @types/node

This gives us everything we need to run TypeScript code, open Chrome and access environment variables. Because the agent will also call an AI model, we need to install the Gemini SDK. For this example, I use @google/genai:

npm install @google/genai

The last preparation step is to add a script to package.json:

{
  "scripts": {
    "agent": "tsx agent.ts"
  }
}

This command will run the agent.ts file, where we will put the main logic.

Check if modelContext Exists

Now that the project is prepared, let's create the first version of agent.ts. At this stage, I only want to check whether modelContext is available inside the browser page.

import { chromium } from "playwright";

const gameUrl = process.argv[2] ?? "http://localhost:5173";

async function main() {
  const context = await chromium.launchPersistentContext(
    "./.chrome-agent-profile",
    {
      channel: "chrome",
      headless: false,
      args: ["--enable-experimental-web-platform-features"],
    },
  );

  const page = await context.newPage();
  await page.goto(gameUrl, { waitUntil: "networkidle" });

  const result = await page.evaluate(() => ({
    userAgent: navigator.userAgent,
    hasNavigatorModelContext: "modelContext" in navigator,
    hasDocumentModelContext: "modelContext" in document,
  }));

  console.log(result);
}

main().catch((error) => {
  console.error(error);
  process.exit(1);
});

This code opens Chrome, navigates to the game page, and checks if modelContext exists on navigator or document. One important detail is that I am not using the bundled Chromium from Playwright. Instead, I am opening the real Chrome installed on my machine by using launchPersistentContext with channel: "chrome". This matters because WebMCP is still experimental. In my case, the isolated Chromium browser did not discover the WebMCP tools correctly, while real Chrome with the enabled flag worked.

Note: Because launchPersistentContext creates a local Chrome profile, do not forget to add this folder to .gitignore:

.chrome-agent-profile/

The profile can contain local browser data such as cache, cookies, and other Chrome state. It should not be committed to the repository.

Read Exposed WebMCP Tools

The first check only tells us whether modelContext exists. The next step is to read the tools exposed by the page. We can do that by calling modelContext.getTools() inside the page.evaluate() method:

const result = await page.evaluate(async () => {
  const modelContext = navigator.modelContext;
  if (!modelContext) {
    return { hasModelContext: false, tools: [] };
  }

  const tools = await modelContext.getTools();
  return {
    hasModelContext: true,
    tools: tools.map((tool) => ({
      name: tool.name,
      description: tool.description,
      inputSchema: tool.inputSchema,
      origin: tool.origin,
    })),
  };
});

This code returns the list of tools exposed by the current page. For each tool, I print basic metadata such as the name, description, input schema and origin. At this point, it is useful to print the result as formatted JSON:

console.log(JSON.stringify(result, null, 2));

This makes it easier to verify that Chrome discovered the WebMCP tools correctly.

Execute a WebMCP Tool

Reading tools is useful, but the real goal is to execute them. In my game, one of the exposed tools is called getGameState. It returns the current state of the puzzle, including the map, remaining moves and collected wood. For the first test, I can find this tool by name and execute it directly:

const gameState = await page.evaluate(async () => {
  const modelContext = (navigator as any).modelContext;
  if (!modelContext) {
    throw new Error("modelContext is empty");
  }

  const tools = await modelContext.getTools();
  const getGameStateTool = tools.find(
    (tool: any) => tool.name === "getGameState",
  );
  if (!getGameStateTool) {
    throw new Error("getGameState tool not found");
  }

  return await modelContext.executeTool(getGameStateTool, "{}");
});

This proves that Playwright can open the page, access modelContext, find a WebMCP tool and execute it inside the browser context. However, hardcoding the tool execution like this is not ideal. The agent should be able to execute any tool by name, so I extracted the logic into a reusable helper function:

import type { Page } from "playwright";

export async function executeWebMcpTool<T>(
  page: Page,
  toolName: string,
  args: unknown,
): Promise<T> {
  return await page.evaluate(
    async ({ toolName, args }) => {
      const modelContext =
        (document as any).modelContext ??
        (navigator as any).modelContext;
      if (!modelContext) {
        throw new Error("Model Context API is not available");
      }

      const tools = await modelContext.getTools();
      const tool = tools.find((tool: any) => tool.name === toolName);
      if (!tool) {
        throw new Error(`Tool not found: ${toolName}`);
      }

      const result = await modelContext.executeTool(
        tool,
        JSON.stringify(args),
      );
      return result;
    },
    { toolName, args },
  );
}

This function receives a Playwright Page, the tool name and arguments. It then evaluates code inside the browser page, finds the matching WebMCP tool, serializes the arguments and executes the tool. With this helper, the Node.js code does not need to know the internal implementation of the page. It only needs the tool name and arguments. That is the important bridge: Playwright controls Chrome, Chrome sees the WebMCP tools and our Node.js code can execute them.

Note: In my setup, navigator.modelContext worked reliably, but WebMCP is still experimental, so in the reusable helper I check both document.modelContext and navigator.modelContext.

Create a Minimal Agent Proof of Concept

Now we can connect the WebMCP tool execution with an AI model. For this article, I want to keep the example small. The goal is not to build the full game-playing agent here. The goal is to prove the basic flow:

Send tool definitions to Gemini.
Let Gemini decide which tool it wants to call.
Execute that tool through WebMCP.
Print the result.

The full agent can build on top of this by sending the tool result back to the model and continuing the loop.

Gen AI SDK

For this example, I use the @google/genai package. We already installed it earlier, so now we can create a small service for communicating with Gemini. Create a new file called genai.service.ts:

import "dotenv/config";
import {
  GoogleGenAI,
  type Content,
  type GenerateContentConfig,
  type GenerateContentResponse,
} from "@google/genai";

export type GenerateRequest = {
  contents: Content[];
  config?: GenerateContentConfig;
};

export class GenaiService {
  private readonly ai: GoogleGenAI;
  private readonly model: string;

  constructor(model: string = "gemini-2.5-flash-lite") {
    this.model = model;
    const apiKey = process.env.GEMINI_API_KEY;
    if (!apiKey) {
      throw new Error("Missing GEMINI_API_KEY in .env");
    }
    this.ai = new GoogleGenAI({ apiKey });
  }

  public async generateContentAsync(
    request: GenerateRequest,
  ): Promise<GenerateContentResponse> {
    const response = await this.ai.models.generateContent({
      model: this.model,
      contents: request.contents,
      config: request.config,
    });
    return response;
  }
}

The implementation is straightforward. The service reads GEMINI_API_KEY from the .env file, creates an instance of GoogleGenAI and exposes one method called generateContentAsync. I also created a small GenerateRequest type. The reason is simple: I only want to expose the properties that this example needs. The original SDK request type contains more options and for this proof of concept that would make the code harder to read.

You also need to create a .env file:

GEMINI_API_KEY = your-api-key

Do not forget to add the .env file to .gitignore, so you do not commit your API key to the repository.

Agent Creation

Now we can put everything together in agent.ts. In this example, the tool definition is hardcoded. That keeps the proof of concept simple and easier to understand. In a more generic version, we could read WebMCP tools from the page and map them into Gemini tool declarations automatically. But that would add more code and I want this article to stay focused on the core idea.

import { chromium, type Page } from "playwright";
import { GenaiService } from "./genai.service";
import {
  FunctionCallingConfigMode,
  type Content,
  type Tool,
} from "@google/genai";

export const tools: Tool[] = [
  {
    functionDeclarations: [
      {
        name: "getGameState",
        description:
          "Get the current board. visibleMap rows run top-to-bottom; each character is x=0 onward. P=player, .=land, W=tree, ~=water, B=bridge, R=rock, and G=goal.",
        responseJsonSchema: {
          type: "object",
          properties: {
            remainingMoves: { type: "number" },
            wood: { type: "number" },
            visibleMap: {
              type: "array",
              items: { type: "string" },
            },
          },
          required: ["remainingMoves", "wood", "visibleMap"],
        },
      },
    ],
  },
];

const gameUrl =
  process.argv[2] ?? "https://tower-before-dusk.gramli.workers.dev";

async function main() {
  const aiService = new GenaiService();

  const context = await chromium.launchPersistentContext(
    "./.chrome-agent-profile",
    {
      channel: "chrome",
      headless: false,
      args: ["--enable-experimental-web-platform-features"],
    },
  );

  const page = await context.newPage();
  await page.goto(gameUrl, { waitUntil: "networkidle" });

  const contents: Content[] = [
    {
      role: "user",
      parts: [
        {
          text: "Inspect the current Tower Before Dusk game state.",
        },
      ],
    },
  ];

  const response = await aiService.generateContentAsync({
    contents,
    config: {
      tools,
      toolConfig: {
        functionCallingConfig: {
          mode: FunctionCallingConfigMode.ANY,
          allowedFunctionNames: ["getGameState"],
        },
      },
    },
  });

  const functionCall = response.functionCalls?.[0];
  if (!functionCall?.name) {
    throw new Error("Gemini did not return a tool call");
  }

  if (functionCall.name !== "getGameState") {
    throw new Error(
      `Gemini requested an unknown tool: ${functionCall.name}`,
    );
  }

  console.log("Gemini tool call: ", functionCall);

  const gameState = await executeWebMcpTool(
    page,
    functionCall.name,
    functionCall.args ?? {},
  );

  console.log("Tool result: ", gameState);
}

main().catch((error) => {
  console.error(error);
  process.exitCode = 1;
});

export async function executeWebMcpTool<T>(
  page: Page,
  toolName: string,
  args: unknown,
): Promise<T> {
  return await page.evaluate(
    async ({ toolName, args }) => {
      const modelContext =
        (document as any).modelContext ??
        (navigator as any).modelContext;
      if (!modelContext) {
        throw new Error("Model Context API is not available");
      }

      const tools = await modelContext.getTools();
      const tool = tools.find((tool: any) => tool.name === toolName);
      if (!tool) {
        throw new Error(`Tool not found: ${toolName}`);
      }

      const result = await modelContext.executeTool(
        tool,
        JSON.stringify(args),
      );
      return result;
    },
    { toolName, args },
  );
}

The flow is simple:

First, the script opens Chrome and navigates to the game page.
Then it sends a prompt to Gemini together with the available tool definition. In this example, Gemini is allowed to call only one function: getGameState.
After Gemini returns a function call, the script validates that the requested function is really getGameState. This is important because the application should never blindly execute arbitrary tool names returned by the model.
Then the script passes the function name and arguments to executeWebMcpTool. The tool is executed inside the browser page through WebMCP and the result is printed to the console.

And that is the proof of concept. Our Node.js script does not call the game directly. It opens the game in Chrome, lets Chrome discover the WebMCP tools, lets Gemini request a tool call, and Playwright executes the matching WebMCP tool inside a real Chrome browser.

What This Proves

This minimal agent demonstrates that you can wire up the Gemini API with WebMCP through Playwright without building a custom Chrome extension. The approach works for any web page that exposes WebMCP tools, and it gives you access to the full range of Gemini models, not just the lightweight ones available in the Model Context Tool Inspector.

Repositories

The source code for this article and the Tower Before Dusk puzzle game are available in the following repositories:

custom-agent - the minimal agent built in this article
tower-before-dusk - the puzzle game that exposes WebMCP tools

Summary

WebMCP lets a web page expose tools that AI agents can discover and execute inside the browser. By combining Playwright with the Gemini API, you can create a simple agent that opens a real Chrome browser, discovers WebMCP tools, and executes them based on AI model decisions. This approach avoids the overhead of building a custom Chrome extension while still providing access to the full browser context that WebMCP requires.

Read on DEV Community ↗ ← Back to News