DEV Community

I built a fully local AI assistant at 16 - no cloud, no API keys, runs on your GPU

I'm 16, from Pune, India. For the past couple of years I've been building O-AI - a fully local AI desktop assistant. No cloud. No API keys. No data leaving your machine. Everything runs on your own GPU.

Why I built it

Every AI assistant I tried sent data somewhere. ChatGPT, Copilot, Gemini - all cloud. I wanted something that felt like JARVIS from Iron Man: smart, fast, personal, and private. So I built it from scratch.

What O-AI can do

Core engine

  • Runs LLMs fully on-device via llama.cpp / Ollama (zero internet required)
  • Self-learning core - extracts facts from every conversation and stores them permanently
  • Fine-tuning pipeline - train the model on your own data, locally

Voice & language

  • Voice control in English, Hindi, and Marathi via Whisper (running locally)
  • Responds in whatever language you speak

Modes

  • JARVIS mode - arc-reactor HUD, 4 reactive states, British-male voice, "sir" persona
  • Take Over PC mode - full desktop automation
  • Animated floating desktop pet (4 types, draggable, reacts to voice)

30+ automation fast-paths: open apps, search the web, control media, screen vision, run code, edit files, cursor control, social media steps, clipboard ops...

Multi-step agent system: plan → execute → verify loop with 14+ step types (web_search, fetch_url, read_screen, run_code, edit_file, open_social, and more)

Stack

  • Backend: Python (Flask IPC + agent core)
  • Frontend: Electron + vanilla JS
  • LLM: llama.cpp / Ollama
  • Voice: Whisper (local) + Edge TTS / neural voice
  • Vision: PIL + screen capture

The hardest bugs

  • "Says done but isn't" - Early versions reported success even when an agent step failed. Fixed by building a proper outcome verifier that reads the actual result, not the plan.
  • The "opens a random video" bug - Asking the agent to play something would open random YouTube videos. Root cause: the plan validator wasn't catching placeholder URLs like [video_url]. Fixed with a universal content guard on all plans.
  • GPU offloading on Windows - Getting all 32 layers onto the GPU with the right CUDA flags took way too long. Worth it though.

What I learned

Building something real teaches you more than any tutorial. Every bug is a design decision you haven't made yet. If you're not embarrassed by v1, you shipped too late.

Follow along

  • GitHub: github.com/Shriisoot
  • Portfolio + TheLab: sankalpkulkarni.com
  • Instagram: @shriisoot

If you're building something local-first with LLMs, drop a comment - I'd love to compare notes.

Comments

No comments yet. Start the discussion.