BrowserAct Hit #1 on Product Hunt, So I Decided to Test It ๐Ÿง™โ€โ™‚๏ธ
DEV Community

BrowserAct Hit #1 on Product Hunt, So I Decided to Test It ๐Ÿง™โ€โ™‚๏ธ

What is BrowserAct?

BrowserAct is a browser automation CLI built specifically for AI agents. Traditional browser automation tools usually focus on opening pages, clicking buttons, and extracting information from websites. That works well for simple tasks, but AI agents often require more than basic automation. Real websites contain authentication flows, dynamic rendering, browser fingerprint checks, session management, and situations where human intervention becomes necessary.

BrowserAct attempts to solve these challenges by treating the browser as an execution environment for agents rather than as a temporary browser window. Some of its capabilities include:

  • Browser session persistence
  • Browser identity isolation
  • Human handoff support
  • Parallel browser sessions
  • Anti-detection browser environments
  • Reusable workflows and skills

What Problem Is BrowserAct Trying To Solve?

After building and testing AI agents, I have noticed that many workflows do not fail because of the language model. The model usually knows what it needs to do. The problem starts when the workflow enters a browser.

For example, consider an AI agent that monitors competitor pricing. On paper the task sounds straightforward. Open a website, extract information, generate a report, and send the output. Reality usually looks different. The website might require authentication. Content may be rendered dynamically. Session state may expire. Security checks can appear unexpectedly. Browser fingerprints may trigger additional verification.

At this point the issue is no longer intelligence. The issue becomes execution. BrowserAct seems designed around that exact problem.

Getting Started with BrowserAct

Getting started with BrowserAct is straightforward, and it can fit naturally into both command-line workflows and AI agent environments. Depending on how you plan to use it, there are two common installation approaches.

1. Install Through an AI Agent (Recommended for Agent Workflows)

If you're using an AI coding agent or an environment that supports skills and tool integrations, BrowserAct can be installed directly as a skill.

npx skills add browser-act/skills --skill browser-act

This allows the agent to invoke BrowserAct capabilities directly inside larger workflows. Instead of writing custom browser automation logic every time, the agent can use BrowserAct whenever browser interaction becomes necessary.

For example, an agent can:

  • Open websites
  • Navigate between pages
  • Handle login flows
  • Fill forms
  • Extract structured content
  • Continue browser-based workflows

This approach is useful when BrowserAct becomes part of a broader AI system rather than a standalone browser tool.

2. Install BrowserAct CLI Directly

If you prefer working directly from the terminal, BrowserAct also provides a CLI installation option.

uv tool install browser-act-cli --python 3.12

After installation, authenticate your local environment:

browser-act auth login
browser-act auth poll

You can also directly configure an API key:

browser-act auth set YOUR_API_KEY

Once authentication is complete, BrowserAct is ready to execute browser workflows locally.

Rather than jumping directly into advanced workflows, I wanted to verify that the installation was working correctly and understand the basic command flow. The first thing I checked was the browser profiles available on my system.

browser-act browser list-profiles

The output lists available browser profiles that BrowserAct can use during execution. These profiles become useful when creating isolated browser environments, preserving state across sessions, or reusing existing login contexts.

Next, I created a browser using my existing Chrome profile.

browser create \
  --type chrome \
  --source-profile local_profile_182885126174998716 \
  --name "browseract" \
  --desc "testing-browseract"

BrowserAct automatically imports available browser state such as cookies and local storage data. Now that the browser environment was ready, I wanted to test a simple content extraction workflow.

First Test: Extracting Website Content

For the first test, I used BrowserAct's stealth-extract command on a simple website.

browser-act stealth-extract https://example.com --content-type markdown

Output:

# Example Domain

This domain is for use in documentation examples without needing permission.

[Learn more](https://iana.org/domains/example)

The result was returned as clean markdown without writing selectors, parsing HTML, or creating custom scraping logic.

Second Test: Opening a Browser Session

Next, I wanted to see how BrowserAct handled an interactive browser session. I opened a browser session using the browser profile I created earlier.

browser-act \
  --session first-test \
  browser open \
  chrome_local_102863481715294440 \
  https://github.com

The output returned details about the active session:

session_name=first-test
browser_type=chrome
url=https://github.com/
title=GitHub

At this point, the browser was active and ready for interaction.

Inspecting Page State

One of the interesting parts of BrowserAct is that it does not expose raw HTML directly. Instead, it generates a structured view of interactive elements that agents can work with.

browser-act \
  --session first-test \
  state

This allows interactions to happen using element references rather than manually creating CSS selectors. To test this further, I clicked the search button:

browser-act \
  --session first-test \
  click 6

At this point, I had completed a complete workflow - Create Browser โ†’ Open Session โ†’ Inspect โ†’ Input โ†’ Click โ†’ Navigate - without writing a browser automation script.

Third Test: Using BrowserAct with an AI Agent

After testing BrowserAct directly through the CLI, I wanted to see how it behaved inside an actual AI agent workflow. For this experiment, I used Codex as the agent runtime.

One advantage of BrowserAct is that it does not need to live as a standalone tool. We can instruct the agent to invoke BrowserAct whenever browser interaction becomes necessary. That includes opening websites, clicking buttons, handling login flows, extracting structured content, or interacting with dynamic pages.

To test this, I asked the agent to install BrowserAct and use it as part of its workflow. Here is the prompt I provided:

Set up BrowserAct for me. Read the BrowserAct skill first: https://github.com/browser-act/skills/blob/main/browser-act/SKILL.md

Install or update the browser-act skill, then verify it works. Use BrowserAct when I need an AI agent to browse, click, fill forms, handle login flows, solve CAPTCHAs, bypass bot detection, or extract structured data from websites.

After setup, open this repository in my browser: https://github.com/browser-act/skills

If I am logged in to GitHub, ask me whether you should star it for me as a quick demo that browser interaction works. Only click the star if I explicitly say yes.

After receiving the instructions, the agent started executing the workflow automatically. The workflow looked something like this:

Agent starts
โ†“
Installs BrowserAct skill
โ†“
Verifies installation
โ†“
Opens GitHub repository
โ†“
Checks authentication state
โ†“
Requests confirmation
โ†“
Performs browser action

The interesting part was watching the agent use BrowserAct as a tool rather than relying on static instructions. Browser actions became part of the execution process instead of separate manual steps.

After receiving confirmation, the agent completed the task successfully and starred the repository. Although starring a GitHub repository is a simple action, it demonstrates an important capability. The agent was able to open a real website, maintain browser state, interact with UI elements, and complete an action inside an authenticated environment. This feels much closer to real-world AI workflows than isolated browser automation scripts.

Fourth Test: Human Handoff with an AI Agent

One feature that interested me most was human handoff. Many automation workflows eventually reach a point where the process cannot continue without human participation. Common examples include OTP verification, QR login, enterprise SSO approval, and security confirmations.

To test this behavior, I used Codex together with BrowserAct and created a workflow that required OTP verification. My objective was straightforward:

  • Let the agent open the login page
  • Enter the email automatically
  • Pause when OTP is required
  • Preserve browser state
  • Resume execution after human input

I provided the following prompt to the agent:

Use BrowserAct for this workflow.

Open: https://practice.expandtesting.com/otp-login

Actions:

  1. Launch BrowserAct
  2. Open the website
  3. Continue the login workflow
  4. If human interaction becomes necessary for OTP, preserve browser state and use BrowserAct's collaboration capability
  5. Resume execution after collaboration completes

The agent launched BrowserAct, created a browser session, and navigated to the login page. When the workflow reached the authentication stage, BrowserAct detected that human interaction was required and generated a collaboration link. I opened the collaboration link and completed the login process manually. Once authentication finished, control automatically returned to the agent without restarting the browser session.

The interesting part was that the browser state remained active throughout the interruption. The agent paused execution, handed control to a human when necessary, and then continued from the exact same session once the required action had been completed. For production workflows involving authentication, verification, or approval steps, this feels significantly more practical than forcing complete automation.

Fifth Test: Running Multiple Browser Sessions

Another scenario I wanted to test was running multiple independent browser sessions. Long-running AI systems rarely perform a single task. They may monitor dashboards, analyze feedback, review customer activity, and collect information simultaneously.

I created several browser sessions locally.

browser-act --session reviews browser open chrome_local_102863481715294440 https://reddit.com
browser-act --session ops browser open chrome_local_102863481715294440 https://status.openai.com
browser-act --session community browser open chrome_local_102863481715294440 https://dev.to

I then listed active sessions:

browser-act session list

The output showed multiple active sessions operating independently. This separation becomes useful because each workflow maintains its own execution state while avoiding interference with other tasks.

Final Thoughts

After spending some time testing BrowserAct, I understand why it attracted attention during its Product Hunt launch. The value does not come from simply opening browsers and clicking buttons. The interesting part is treating the browser as an execution layer for AI agents and handling the problems that appear in real environments.

Thank You!!๐Ÿ™

Thank you for reading this far. If you find this article useful, please like and share this article. Someone could find it useful too.๐Ÿ’–

Connect with me on X, GitHub, LinkedIn

Kiran Naragund - Tech Writer and Moderator @DEV โœฆ Full-Stack Developer โœฆ Mentor @Exercism โœฆ Open-Source Contributor โœฆ Email for Collabs :)

Comments

No comments yet. Start the discussion.