I built a fully local AI assistant at 16 - no cloud, no API keys, runs on your GPU
I'm 16, from Pune, India. For the past couple of years I've been building O-AI - a fully local AI desktop assistant. No cloud. No API keys. No data leaving your machine. Everything runs on your own GPU.
Why I built it
Every AI assistant I tried sent data somewhere. ChatGPT, Copilot, Gemini - all cloud. I wanted something that felt like JARVIS from Iron Man: smart, fast, personal, and private. So I built it from scratch.
What O-AI can do
Core engine
- Runs LLMs fully on-device via
llama.cpp/ Ollama (zero internet required) - Self-learning core - extracts facts from every conversation and stores them permanently
- Fine-tuning pipeline - train the model on your own data, locally
Voice & language
- Voice control in English, Hindi, and Marathi via Whisper (running locally)
- Responds in whatever language you speak
Modes
- JARVIS mode - arc-reactor HUD, 4 reactive states, British-male voice, "sir" persona
- Take Over PC mode - full desktop automation
- Animated floating desktop pet (4 types, draggable, reacts to voice)
30+ automation fast-paths: open apps, search the web, control media, screen vision, run code, edit files, cursor control, social media steps, clipboard ops...
Multi-step agent system: plan → execute → verify loop with 14+ step types (web_search, fetch_url, read_screen, run_code, edit_file, open_social, and more)
Stack
- Backend: Python (Flask IPC + agent core)
- Frontend: Electron + vanilla JS
- LLM:
llama.cpp/ Ollama - Voice: Whisper (local) + Edge TTS / neural voice
- Vision: PIL + screen capture
The hardest bugs
- "Says done but isn't" - Early versions reported success even when an agent step failed. Fixed by building a proper outcome verifier that reads the actual result, not the plan.
- The "opens a random video" bug - Asking the agent to play something would open random YouTube videos. Root cause: the plan validator wasn't catching placeholder URLs like
[video_url]. Fixed with a universal content guard on all plans. - GPU offloading on Windows - Getting all 32 layers onto the GPU with the right CUDA flags took way too long. Worth it though.
What I learned
Building something real teaches you more than any tutorial. Every bug is a design decision you haven't made yet. If you're not embarrassed by v1, you shipped too late.
Follow along
- GitHub: github.com/Shriisoot
- Portfolio + TheLab: sankalpkulkarni.com
- Instagram: @shriisoot
If you're building something local-first with LLMs, drop a comment - I'd love to compare notes.
Comments
No comments yet. Start the discussion.