DEV Community
Grade 10
2h ago
Building M31A: A Terminal-Native AI Coding Agent That Ships, Not Just Suggests
Most AI coding assistants are glorified autocomplete on steroids. They suggest code, maybe write a function or two, but leave you holding the bag when it comes to testing, verification, and actually shipping the changes. M31A (M31 Autonomous) takes a different approach. It's a terminal-based AI coding agent written in Go that owns a six-phase workflow end-to-end : Initialize β Discuss β Plan β Execute β Verify β Ship. Every run ends with a verified git commit and a learning ledger entry. One static binary, zero telemetry, any POSIX shell. In this post, I'll walk you through the architecture, design decisions, and technical highlights of this open-source project. The Problem: AI Assistants That Don't Finish the Job Here's the typical workflow with most AI coding tools: Ask the AI to write some code Copy-paste the suggestion into your editor Run tests manually Debug the inevitable issues Repeat until it works Commit the changes yourself The AI "helped" with step 1, but you're still doing 80% of the work. And if something breaks three commits later? Good luck figuring out what the AI actually changed. M31A flips this model. Instead of being a suggestion engine, it's an autonomous agent that: Asks clarifying questions before planning Generates a structured implementation plan Executes tasks with proper dependency resolution Runs verification (tests, syntax checks) Commits verified changes to git Records what it learned for future sessions Architecture at a Glance M31A is built with a clean six-layer architecture: βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ βTUI Layer (Bubble Tea) β β29 screens, keyboard/mouse handling, streaming display β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ βWorkflow Engine β βSix-phase orchestration, LLM streaming, plan parsing β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β βββββββββββββββββββΌββββββββββββββββββ β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ βProviders ββTools ββDomain Packages β βOpenRouterββBashββsession, ledger β βZen ββFileReadββrollback, bisectβ βFallbackββFileWrite ββtaskrunnerβ βββGlob, Grepββkeychainβ ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ βInfrastructure Layerβ βgit, config, tokens, codeintel, fileutil, loggingβ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ The key insight? Separation of concerns at every level. The TUI doesn't know about LLM APIs. The workflow engine doesn't know about terminal rendering. The tools don't know about workflow phases. The Six-Phase Workflow Engine The heart of M31A is the workflow engine, implemented in internal/workflow/engine.go . Let's break down each phase: Phase 1: Initialize The agent detects your project type (Go, Python, Node, etc.), initializes git if needed, and creates a .m31a/ planning directory with: PROJECT.md β project metadata STATE.md β current workflow state TASKS.md β task list (populated later) // From internal/workflow/initialize.go func ( e * Engine ) runInitialize ( ctx context . Context ) error { // Detect project type, framework, language project := e . detectProject () // Initialize git repo if needed if ! e . git . IsRepository () { e . git . Init () } // Create planning directory os . MkdirAll ( e . planningDir , 0755 ) // Write PROJECT.md, STATE.md e . writeProjectState ( project ) } Phase 2: Discuss Before jumping into code, the agent asks clarifying questions via LLM streaming. This prevents the classic "I built exactly what you asked for, but not what you wanted" problem. The discuss phase uses embedded prompt templates (loaded via //go:embed prompts/*.md ) to guide the LLM toward asking useful questions about scope, constraints, and edge cases. Phase 3: Plan The agent generates a structured implementation plan in markdown format. A custom parser ( internal/workflow/plan_parser.go ) extracts: Task titles and descriptions Dependencies between tasks Files that will be modified Review notes and questions // From internal/workflow/plan_parser.go type Plan struct { Title string Tasks [] Task Questions [] string Notes string } type Task struct { ID int Action string Description string Files [] string Dependencies [] int } The plan parser supports refinement with retry logic (max 3 retries, max 5 refinements) and classifies prompt complexity: trivial β simple β moderate β complex . Phase 4: Execute This is where the rubber meets the road. The task runner ( pkg/taskrunner/runner.go ) uses Kahn's algorithm for topological sorting to determine execution order: // From pkg/taskrunner/runner.go func ( r * Runner ) Schedule () ([][] int , error ) { // Build adjacency list and in-degree count inDegree := make ( map [ int ] int ) dependents := make ( map [ int ][] int ) for _ , t := range r . tasks { for _ , dep := range t . Dependencies { inDegree [ t . ID ] ++ dependents
Most AI coding assistants are glorified autocomplete on steroids. They suggest code, maybe write a function or two, but leave you holding the bag when it comes to testing, verification, and actually shipping the changes. M31A (M31 Autonomous) takes a different approach. It's a terminal-based AI coding agent written in Go that owns a six-phase workflow end-to-end: Initialize β Discuss β Plan β Execute β Verify β Ship. Every run ends with a verified git commit and a learning ledger entry. One static binary, zero telemetry, any POSIX shell. In this post, I'll walk you through the architecture, design decisions, and technical highlights of this open-source project. The Problem: AI Assistants That Don't Finish the Job Here's the typical workflow with most AI coding tools: - Ask the AI to write some code - Copy-paste the suggestion into your editor - Run tests manually - Debug the inevitable issues - Repeat until it works - Commit the changes yourself The AI "helped" with step 1, but you're still doing 80% of the work. And if something breaks three commits later? Good luck figuring out what the AI actually changed. M31A flips this model. Instead of being a suggestion engine, it's an autonomous agent that: - Asks clarifying questions before planning - Generates a structured implementation plan - Executes tasks with proper dependency resolution - Runs verification (tests, syntax checks) - Commits verified changes to git - Records what it learned for future sessions Architecture at a Glance M31A is built with a clean six-layer architecture: βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β TUI Layer (Bubble Tea) β β 29 screens, keyboard/mouse handling, streaming display β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Workflow Engine β β Six-phase orchestration, LLM streaming, plan parsing β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β βββββββββββββββββββΌββββββββββββββββββ β β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββ β Providers β β Tools β β Domain Packages β β OpenRouter β β Bash β β session, ledger β β Zen β β FileRead β β rollback, bisect β β Fallback β β FileWrite β β taskrunner β β β β Glob, Grep β β keychain β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββ β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Infrastructure Layer β β git, config, tokens, codeintel, fileutil, logging β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ The key insight? Separation of concerns at every level. The TUI doesn't know about LLM APIs. The workflow engine doesn't know about terminal rendering. The tools don't know about workflow phases. The Six-Phase Workflow Engine The heart of M31A is the workflow engine, implemented in internal/workflow/engine.go . Let's break down each phase: Phase 1: Initialize The agent detects your project type (Go, Python, Node, etc.), initializes git if needed, and creates a .m31a/ planning directory with: - PROJECT.md β project metadata - STATE.md β current workflow state - TASKS.md β task list (populated later) // From internal/workflow/initialize.go func (e *Engine) runInitialize(ctx context.Context) error { // Detect project type, framework, language project := e.detectProject() // Initialize git repo if needed if !e.git.IsRepository() { e.git.Init() } // Create planning directory os.MkdirAll(e.planningDir, 0755) // Write PROJECT.md, STATE.md e.writeProjectState(project) } Phase 2: Discuss Before jumping into code, the agent asks clarifying questions via LLM streaming. This prevents the classic "I built exactly what you asked for, but not what you wanted" problem. The discuss phase uses embedded prompt templates (loaded via //go:embed prompts/*.md ) to guide the LLM toward asking useful questions about scope, constraints, and edge cases. Phase 3: Plan The agent generates a structured implementation plan in markdown format. A custom parser (internal/workflow/plan_parser.go ) extracts: - Task titles and descriptions - Dependencies between tasks - Files that will be modified - Review notes and questions // From internal/workflow/plan_parser.go type Plan struct { Title string Tasks []Task Questions []string Notes string } type Task struct { ID int Action string Description string Files []string Dependencies []int } The plan parser supports refinement with retry logic (max 3 retries, max 5 refinements) and classifies prompt complexity: trivial β simple β moderate β complex . Phase 4: Execute This is where the rubber meets the road. The task runner (pkg/taskrunner/runner.go ) uses Kahn's algorithm for topological sorting to determine execution order: // From pkg/taskrunner/runner.go func (r *Runner) Schedule() ([][]int, error) { // Build adjacency list and in-degree count inDegree := make(map[int]int) dependents := make(map[int][]int) for _, t := range r.tasks { for _, dep := range t.Dependencies { inDegree[t.ID]++ dependents[dep] = append(dependents[dep], t.ID) } } // Find all tasks with no dependencies var queue []int for _, t := range r.tasks { if inDegree[t.ID] == 0 { queue = append(queue, t.ID) } } // Process tasks in topological order var groups [][]int for len(queue) > 0 { groups = append(groups, queue) var next []int for _, id := range queue { for _, dep := range dependents[id] { inDegree[dep]-- if inDegree[dep] == 0 { next = append(next, dep) } } } queue = next } return groups, nil } Tasks within a group can run with bounded parallelism (default: 4 concurrent tasks via semaphore). The executor includes a self-heal loop that retries recoverable failures up to 2 times. Phase 5: Verify The agent runs verification checks: - File existence validation - Syntax checking (language-specific) - Test execution - Smart file truncation for LLM context If verification fails, the agent can rollback the commit chain using git-bisect integration. Phase 6: Ship The final phase: - Creates a git commit with all verified changes - Writes a ledger entry (cross-session learning record) - Archives the session - Generates a demonstration summary Provider System: Multi-LLM with Automatic Fallback M31A supports two LLM providers out of the box: - OpenRouter β primary gateway with access to Claude, GPT-4, etc. - Zen β secondary provider (OpenCode Zen) The provider layer (internal/provider/ ) includes some clever engineering: Automatic Fallback When a provider degrades (429 rate limit, 503 service unavailable), M31A automatically switches to a healthy provider. The fallback logic uses parallel health checks to minimize latency: // From internal/provider/fallback.go func FindFallbackProvider(registry *Registry, current string) (string, *FallbackEvent, error) { // Collect candidate providers candidates := registry.ListAll() // Parallel health checks (10s timeout) ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) defer cancel() ch := make(chan result, len(candidates)) for _, c := range candidates { go func(c candidate) { status := c.provider.HealthCheck(ctx) ch 3 { level = boostLevel(level, 1) } // Boost when task has many dependencies if len(task.Dependencies) > 3 { level = boostLevel(level, 1) } input, output := s.EstimateTokens(level, task) return level, input + output } The scorer uses keyword analysis to classify tasks as simple , moderate , or complex , then recommends the cheapest model that can handle that complexity level. Tool System: Deliberately Small, Aggressively Sandboxed M31A ships with 5 core tools: - Bash β shell command execution - FileRead β read files with size limits (50MB max) - FileWrite β atomic file writes (temp + rename) - Glob β file pattern matching (doublestar, 1000 result limit) - Grep β content search (ripgrep when available, pure-Go fallback) The tool surface area is intentionally small. Each tool is aggressively sandboxed with: Permission Gating Every tool call is gated by a permission modal with configurable timeout (default 300s): // From internal/tools/permissions.go type PermissionMode string const ( ModeAsk PermissionMode = "ask" ModeAllowAll PermissionMode = "allow_all" ModeDenyAll PermissionMode = "deny_all" ) func (d *Dispatcher) RequestPermission(ctx context.Context, tool Tool, input ToolInput) error { if d.mode == ModeAllowAll { return nil } // Send permission request to TUI ch := make(chan PermissionResponse) d.emitter.Emit(PermissionRequestMsg{...}) // Wait for user response with timeout select { case resp := /.m31a/session.json , including: - Workflow state (goal, phase, questions) - Message history (separate messages.json ) - Checkpoints (max 2 for undo/rollback) If you hit Ctrl+C , lose network, or your laptop dies, you can resume mid-workflow: $ m31a --resume # Shows session browser with recent sessions # Restores workflow state and continues from last checkpoint Testing Strategy M31A uses Go's standard testing package with no external mocking frameworks: - Unit tests: individual functions/methods - Integration tests: real git repos, temp dirs, HTTP test servers - Security tests: SSRF protection, timeout enforcement, path traversal - Table-driven tests: anonymous structs with t.Parallel() Coverage targets: - Overall: 75% (currently ~74.7%) - Critical packages: 90% β pkg/taskrunner (89.9%),pkg/bisect (91.3%),pkg/rollback (89.1%) The test suite includes some interesting patterns: // Security test for SSRF protection func TestWebFetch_BlocksPrivateIPs(t *testing.T) { tests := []struct { url string wantErr error }{ {"http://127.0.0.1/admin", ErrPrivateIPBlocked}, {"http://192.168.1.1/config", ErrPrivateIPBlocked}, {"http://10.0.0.1/secret", ErrPrivateIPBlocked}, {"http://169.254.169.254/metadata", ErrPrivateIPBlocked}, // AWS metadata } for _, tt := range tests { t.Run(tt.url, func(t *testing.T) { t.Parallel() _, err := WebFetch(tt.url) if !errors.Is(err, tt.wantErr) { t.Errorf("got %v, want %v", err, tt.wantErr) } }) } } Getting Started Installation is a one-liner: # macOS
Comments
No comments yet. Start the discussion.