Tool Permission Matrix Builder & Validator: Structured, Visual Policy Management for AI Agent Teams
DEV Community

Tool Permission Matrix Builder & Validator: Structured, Visual Policy Management for AI Agent Teams

What This Platform Does

The platform addresses the full lifecycle of agent tool governance in one place. It starts with tool registration - each tool is defined and assigned a risk category: read-only, internal-write, external-api, financial, destructive, or administrative. Roles are then created for each agent type - analyst, operator, admin, readonly-bot, or whatever the team's structure requires.

The permission matrix takes these two dimensions and lets permissions be assigned by dragging tools onto roles or clicking individual cells to toggle between allowed, denied, and inherited states. The matrix validates in real time: if a role has access to a tool whose risk level exceeds what that role should have, a warning appears immediately.

Once the matrix is configured, a policy artifact is exported - JSON for machine consumption, YAML for GitOps workflows, or a Python module with a check_permission(role, tool) function that can be imported directly into agent code.

On the validation side, existing agent code can be pasted in and Claude analyzes which tools it actually calls, cross-checks those against the matrix, and produces a security score with sorted recommendations. A separate sprawl analysis detects over-exposure: roles with too many high-risk tools, tools granted to too many roles, and unused grants.

Architecture

The backend is fully async - all 24 routes use async def with an aiosqlite-backed SQLAlchemy session. This is intentional: the Claude API calls in agent validation and sprawl analysis can take 5–15 seconds, and with a synchronous backend, one validation request would block all other users. With async, many concurrent requests are handled without blocking.

Both AI services have heuristic fallbacks. If ANTHROPIC_API_KEY is not set, the agent validator still extracts tool call patterns from code using regex and checks them against the matrix, and the sprawl analyzer still computes numerical sprawl metrics. The Claude path produces richer narrative and nuanced recommendations; the heuristic path still provides actionable data.

The policy generator produces three output formats from the same matrix data. The Python module output is syntax-verified via py_compile before being returned, ensuring the downloaded file is always importable.

The repo also includes architecture.svg.

Quick Start

Prerequisites

  • Python 3.11 or newer
  • Node.js 18 or newer
  • An Anthropic API key (optional - Claude features fall back to heuristic analysis without it)

Set up the environment

cp .env.example .env
# Optionally add ANTHROPIC_API_KEY=sk-ant-your-key-here for Claude analysis

Run the backend

cd backend
pip install -r requirements.txt
uvicorn main:app --reload --host 0.0.0.0 --port 8000

The API starts at http://localhost:8000. Swagger UI is available at http://localhost:8000/docs.

Run the frontend

The frontend runs in a separate process:

cd frontend
npm install
npm run dev

The UI opens at http://localhost:5173.

Run with Docker

cp .env.example .env
docker compose up --build

The backend runs on port 8000 with a health check. The frontend serves via nginx on port 80 and waits for the backend health check before starting.

Running Tests

cd backend && python -m pytest tests/ -v

22 tests - policy generation (JSON/YAML/Python), agent validator, heuristic analysis. Runs in under a second.

API Reference

Risk Categories

All six risk categories are implemented as a Python Enum and stored in the database. The permission matrix UI shows these risk colors on every tool badge, and real-time validation warnings fire when a role's allowed risk levels would be exceeded.

Project Structure

tool-permission-matrix/
├── backend/
│   ├── main.py                # FastAPI app, 24 async routes
│   ├── models.py              # Tool, Role, Permission ORM + RiskCategory Enum
│   ├── schemas.py             # Pydantic v2 request/response schemas
│   ├── database.py            # Async SQLite via aiosqlite
│   ├── services/
│   │   ├── policy_generator.py # JSON, YAML, and Python module export
│   │   ├── agent_validator.py  # Claude + heuristic agent code analysis
│   │   └── sprawl_analyzer.py  # Claude + heuristic sprawl detection
│   ├── requirements.txt
│   ├── Dockerfile.backend
│   └── tests/
│       ├── test_policy_generator.py  # 11 policy generation tests
│       ├── test_validator.py         # 11 validator tests
│       └── fixtures/
│           ├── sample_agent.py       # Realistic agent with tool call patterns
│           └── sample_policy.json    # Realistic permission matrix fixture
├── frontend/
│   ├── src/
│   │   ├── App.tsx                   # Tab layout: Tools/Roles/Matrix/Export/Validate/Sprawl
│   │   ├── stores/
│   │   │   ├── toolStore.ts          # Zustand store for tool state
│   │   │   ├── roleStore.ts          # Zustand store for role state
│   │   │   └── matrixStore.ts        # Zustand store for permission matrix
│   │   ├── components/
│   │   │   ├── ToolRegistry.tsx      # CRUD + filter + JSON import/export
│   │   │   ├── RoleManager.tsx       # CRUD + inheritance + risk levels
│   │   │   ├── PermissionMatrix.tsx  # @dnd-kit DnD grid
│   │   │   ├── PolicyExporter.tsx    # Format selector + download
│   │   │   ├── AgentValidator.tsx    # Paste/upload + results display
│   │   │   └── SprawlAnalysis.tsx    # Sprawl score + issues list
│   │   ├── api/client.ts            # axios-based API client, 20 methods
│   │   └── types/index.ts           # TypeScript interfaces (28 types)
│   ├── Dockerfile.frontend
│   ├── package.json
│   └── vite.config.ts
├── docker-compose.yml
└── .env.example

The structure maps directly onto the platform's three functional layers. The backend/services/ directory holds the three pieces that do the heavy lifting - policy generation, agent validation, and sprawl analysis - each isolated from the routing layer in main.py. The frontend mirrors this with one component per tab in the UI, with tool state, role state, and matrix state each managed by a dedicated Zustand store.

Key Design Decisions

Async throughout - All backend routes are async def and the SQLAlchemy session uses aiosqlite. The Claude API calls in agent validation and sprawl analysis can take 5–15 seconds. A synchronous backend would block all other users during that window; the async design handles many concurrent requests without blocking.

Three-state permission model - Each matrix cell is ALLOWED, DENIED, or INHERITED - not just a binary toggle. INHERITED means the permission comes from the role's parent role, enabling role hierarchies where a base role defines conservative defaults and derived roles override specific tools.

Heuristic fallback for AI features - Claude-powered features are never the only path. The agent validator extracts tool calls using regex patterns that cover the most common calling conventions, then checks them against the matrix. The sprawl analyzer computes over-exposure metrics numerically. The platform is fully usable in restricted environments without an API key; Claude's analysis is an enhancement rather than a dependency.

Policy Python module verification - When generating a Python module, py_compile is called on the output before returning it. A permissions.py that fails to compile would be worse than no policy at all, so this check runs as a hard gate.

Verified Results

The backend ships with 22 tests covering policy generation in all three export formats (JSON, YAML, Python module), agent validator tool-call extraction across standard and use_tool/call_tool calling conventions, heuristic analysis correctness, and edge cases like empty code and missing policy. The frontend builds cleanly to a 277 KB JS bundle across 110 modules with @dnd-kit drag-and-drop and Zustand state management.

For the AI-powered sprawl analysis, the SprawlAnalyzer was run (using DeepSeek V4 Flash via OpenRouter for this verification pass) against a three-role matrix - admin, developer, viewer - with six tools spanning read, write, and destructive categories. The model returned a sprawl score of 80/100 and surfaced nine issues. Two were critical: the admin role holding both execute_code and delete_resource, and the developer role also having execute_code with no approval gate. The overall analysis named the pattern as excessive concentration of destructive tool access and recommended introducing approval workflows before any destructive operation.

How I Built This Using NEO

This project was built using NEO. NEO is a fully autonomous AI engineering agent that can write code and build solutions for AI/ML tasks including AI model evals, prompt optimization and end to end AI pipeline development.

The requirement was a visual policy management platform where AI agent teams could define tools, classify risk, assign roles, configure permissions on a drag-and-drop matrix, and export machine-readable policy artifacts - with Claude-powered validation and sprawl analysis built in. NEO planned and produced the files in this repository - a fully async FastAPI backend with 24 routes, three backend services handling policy generation, agent validation, and sprawl analysis, a React and TypeScript frontend with a drag-and-drop permission matrix, six UI components, three Zustand stores, and a 22-test suite covering all major paths. The plans/ directory and ORCHESTRATOR_LOG.md in the repo document that build run directly.

The result is a fully working policy management platform - from tool registration through risk classification, matrix configuration, policy export, and agent validation - with heuristic fallbacks at every AI-powered step so the platform remains useful with or without an API key.

How You Can Use This With NEO

Govern tool access across an existing AI agent team. Any team running multiple agents with different access levels can register their tools, classify them by the six built-in risk categories, and configure a permission matrix without writing a single line of policy code. The matrix validates in real time as roles and permissions are assigned.

Validate existing agent code against a policy. Agent code can be pasted directly into the platform and the validator extracts which tools it actually calls, cross-checks them against the configured matrix, and returns a security score with specific recommendations. The heuristic path works without an API key; the Claude path produces richer analysis when ANTHROPIC_API_KEY is set.

Export a check_permission(role, tool) function directly into agent code. Once the matrix is configured, the Python module export generates a permissions.py file that is syntax-verified before download and can be imported directly into any agent codebase - no manual policy translation required.

Detect permission sprawl in an existing matrix. The sprawl analysis endpoint scores the matrix for over-exposure - roles with too many high-risk tools, tools granted to too many roles, and unused grants. The heuristic path computes numerical metrics without an API key; the Claude path names specific patterns and recommends remediation when ANTHROPIC_API_KEY is set.

Final Notes

The gap between "we think our agents have the right permissions" and "we can prove it and export it as code" is where this platform sits. Tool access in AI agent systems is a governance problem that gets harder as teams scale - more agents, more tools, more roles, and no single source of truth. The Tool Permission Matrix Builder & Validator makes that source of truth visual, exportable, and machine-readable.

The code is at https://github.com/dakshjain-1616/Tool-Permission-Matrix-Builder-Validator

You can also build with NEO in your IDE using the VS Code extension or Cursor. You can use NEO MCP with Claude Code: https://heyneo.com/claude-code

Comments

No comments yet. Start the discussion.