DEV Community

The 5% Router Tax: What Hosted LLM Gateways Charge For (and How to Self-Host It)

What the Fee Buys

  • Unified API across providers - one format in, translated per-provider out.
  • Failover - a provider 500s, your request retries elsewhere.
  • Model marketplace - new models available the day they launch.
  • Consolidated billing - one invoice instead of six provider accounts.
  • (Sometimes) smart routing - OpenRouter's auto router picks a model per-request.

Items 1, 2, and 5 are software. Items 3 and 4 are genuinely hard to self-host - if you want day-one access to every new model with zero account setup, the marketplace earns its fee. But most coding workloads use a handful of models, not four hundred.

The Parts a Hosted Router Structurally Can't Give You

  • Local models as a tier. No hosted router will route your easy requests to the Ollama instance on your own machine - free, private, zero latency to first byte on cached weights. For coding traffic, where (in my instrumented sessions) 70โ€“90% of requests are simple enough for a good local model, this is the single biggest cost lever, and it's only available to something running on your side of the wire.
  • Your data staying home. Self-hosted means prompts, code, and keys never transit a third party. For anyone with a compliance requirement - or code they'd rather not ship to a router's logs - this isn't a preference, it's a prerequisite.
  • Token optimization before the bill. A hosted router bills you for the tokens you send it - it has no incentive to shrink them. A self-hosted proxy can strip unusable tool schemas (measured: โˆ’53% on tool-heavy requests) and compress JSON tool results (measured: 3,458 โ†’ 427 tokens on a grep result) before any provider bills you. That's not a routing saving; it stacks on top of routing.
  • No availability dependency. Hosted routers go down (OpenRouter's outages have their own HN threads) and offer no SLA at consumer tiers. A local proxy fails independently of anyone's status page.

What Self-Hosting Costs You

Honesty cuts both ways:

  • You run a process. npm install -g lynkr && lynkr init && lynkr start - but it's yours now: updates, logs, the works.
  • You manage provider accounts. Two or three API keys instead of one. The consolidated invoice is genuinely gone.
  • Model lag. A new provider means waiting for support (or a PR) instead of it appearing in a dropdown.
  • Nobody to email. Self-hosted support is a GitHub issue tracker.

If those trade-offs read as "fine," the math is straightforward: the 5% fee disappears, the local-tier routing removes the easy majority of requests from your bill entirely, and compression shrinks what's left.

The Hybrid That Actually Makes Sense

This isn't either/or. A pattern I see working:

Coding tool โ†’ self-hosted proxy (Lynkr)
โ”œโ”€ SIMPLE/MEDIUM โ†’ local Ollama/llama.cpp (free)
โ”œโ”€ COMPLEX โ†’ direct provider API keys (no fee)
โ””โ”€ exotic models โ†’ OpenRouter (5% on the long tail only)

Keep a hosted router as one backend for the long tail of models you rarely need, route the bulk directly or locally, and let the proxy's classifier decide per-request. You get the marketplace when you want it without paying the tax on your entire volume.

Lynkr is Apache-2.0, self-hosted, supports 13 providers including Ollama, llama.cpp, LM Studio, Bedrock, Azure, Databricks - and OpenRouter itself as a tier: github.com/Fast-Editor/Lynkr. Benchmarks with methodology are in the repo; run them on your own workload before believing anyone's percentages, including mine.

Comments

No comments yet. Start the discussion.