DEV Community

How I Built and Secured a Self-Hosted Stack

I built and operate a 13-service self-hosted platform on a single Linux VPS: a personal AI chat interface, budgeting, RSS, notes, bookmarks, uptime monitoring, a dashboard, dev utilities - and a self-hosted autonomous AI agent. Everything sits behind one reverse proxy with automatic HTTPS, most of it behind single sign-on, and the whole thing is captured as Docker Compose config that survives reboots and rebuilds.

Up front, honestly: this is a personal, self-directed project, and I'd put my level at junior / early-career. I designed and run it, but it isn't an audited, production-grade environment. The value I'd point a reviewer to isn't enterprise completeness - it's the reasoning. So I'm going to lead with the part I cared most about: containing the AI agent.

The decision I'm proudest of (and the mistake behind it)

The interesting security question in this stack is: how do you contain a thing that's actively trying to get around your controls? The agent - Hermes, by Nous Research - has persistent memory and tool use: it can execute code, browse, and run web searches. It legitimately needs exactly two things from the rest of the stack: the chat front-end (to talk to) and the private metasearch service (to search). It does not need the database, the notes app, the budget data, or the host.

The mistake I caught

In the first iteration, the agent was sitting on the shared application network alongside everything else - which meant it had a network path to the database port. I didn't intend that; it was just the default outcome of dropping it on the same network as the apps.

What I did instead of panicking

I stopped and reasoned about the actual blast radius. The database password was never readable by the agent - it lives in the database/Compose environment, not on the agent's filesystem or in anything its tools could read. What was exposed was an open port: a reachable network path to a data store from a code-executing agent. So the real exposure was narrower than "the agent can read my database." But that distinction doesn't earn the path a pass. Least privilege says an autonomous agent shouldn't have a route to a data store it has no reason to touch - full stop - regardless of whether I currently believe the credentials are safe.

The re-architecture

I moved the agent onto a dedicated, isolated Docker network (hermes-net):

  • The chat front-end and metasearch service join both networks, so they bridge between the app stack and the agent's network.
  • The agent sits on its isolated network only - it can reach exactly those two services and nothing else.
  • The posture is default-closed: granting the agent a new service is a deliberate, one-at-a-time action (docker network connect hermes-net <service>), and the database is intentionally never on that list.

Why a network boundary beats a "soft" gate

Here's the principle the whole design rests on: An autonomous agent with tool use will try multiple routes around a soft "are you sure?" block. The boundary that actually holds is the one it can't reason its way past.

The agent ships with an in-app approve/deny prompt - but that gate is part of its native UI, and it isn't even reachable when the agent is driven through its API, which is how the chat interface talks to it. So the soft gate is doubly weak: it can be routed around, and on the path I actually use it isn't in the loop at all.

A network boundary has neither weakness. If there's no route, there's nothing to negotiate and no alternate path to find. That's the lesson I'd most want a reviewer to take from this: I didn't just flip a safety toggle and trust it - I reasoned about whether it could be bypassed, decided it could, and moved the boundary somewhere it couldn't.

Defense in depth around the agent

Network isolation is the main wall; these reduce what a problem could do even inside it:

  • Filesystem isolation - the agent's terminal runs inside its own container, scoped to its own data directory. It can't see the host or other containers' files.
  • In-container terminal, not the Docker socket - wiring it to a Docker backend would mean handing it host control. Deliberately not done.
  • Internal-only API - its OpenAI-compatible endpoint is on an unpublished port with no reverse-proxy route, reachable only by containers on its own network, and it still requires an API key.
  • Outbound-only, allow-listed remote access - phone access is a Telegram bot using outbound polling (no inbound ports), locked to a single allow-listed user ID. The "allow any user" flag is never set.
  • Image pinned by digest - updates are deliberate, reviewed pulls, not whatever :latest happens to be that day.
  • A hard cost ceiling - a credit cap on the model provider bounds runaway token spend if a scheduled job loops.

Host hardening (the foundation under everything)

A few decisions worth calling out, each here for a reason rather than because a guide said so:

SSH: key-only, with a tested escape hatch. ed25519 keys only; password auth and root-password login disabled. The part that mattered more than the config: I set up and tested the provider's out-of-band recovery console before disabling password login. Test the escape hatch first, then flip the switch.

ufw: default-deny, and verifying what's actually open. The firewall is default-deny with explicit allows - rate-limited SSH and 80/443. The lesson came when I found Docker's API ports (2375/2376) showing as open - stray rules, nothing listening - and removed them. Docker manipulates the host firewall directly and can punch holes you didn't author, so I verified what was actually exposed rather than trusting my config described reality.

fail2ban. Honestly, with passwords already disabled this is more about cutting log noise than a hard security gain - but it's cheap, correct, and the right default.

One reverse proxy, one login, one deliberate exception

The internet only ever talks to Caddy, on ports 80/443. Caddy terminates TLS (certs auto-issued via Let's Encrypt) and reverse-proxies each subdomain to the right internal container. Most services sit behind Tinyauth single sign-on, implemented as reverse-proxy forward-auth - one login sets a cookie scoped to the parent domain that covers every gated subdomain:

# reusable snippet, imported into each gated app's block
(tinyauth) {
    forward_auth tinyauth:3000 {
        uri /api/auth/caddy
    }
}

Comments

No comments yet. Start the discussion.