The 5% Router Tax: What Hosted LLM Gateways Charge For (and How to Self-Host It)
What the Fee Buys
- Unified API across providers - one format in, translated per-provider out.
- Failover - a provider 500s, your request retries elsewhere.
- Model marketplace - new models available the day they launch.
- Consolidated billing - one invoice instead of six provider accounts.
- (Sometimes) smart routing - OpenRouter's auto router picks a model per-request.
Items 1, 2, and 5 are software. Items 3 and 4 are genuinely hard to self-host - if you want day-one access to every new model with zero account setup, the marketplace earns its fee. But most coding workloads use a handful of models, not four hundred.
The Parts a Hosted Router Structurally Can't Give You
- Local models as a tier. No hosted router will route your easy requests to the Ollama instance on your own machine - free, private, zero latency to first byte on cached weights. For coding traffic, where (in my instrumented sessions) 70โ90% of requests are simple enough for a good local model, this is the single biggest cost lever, and it's only available to something running on your side of the wire.
- Your data staying home. Self-hosted means prompts, code, and keys never transit a third party. For anyone with a compliance requirement - or code they'd rather not ship to a router's logs - this isn't a preference, it's a prerequisite.
- Token optimization before the bill. A hosted router bills you for the tokens you send it - it has no incentive to shrink them. A self-hosted proxy can strip unusable tool schemas (measured: โ53% on tool-heavy requests) and compress JSON tool results (measured: 3,458 โ 427 tokens on a grep result) before any provider bills you. That's not a routing saving; it stacks on top of routing.
- No availability dependency. Hosted routers go down (OpenRouter's outages have their own HN threads) and offer no SLA at consumer tiers. A local proxy fails independently of anyone's status page.
What Self-Hosting Costs You
Honesty cuts both ways:
- You run a process.
npm install -g lynkr && lynkr init && lynkr start- but it's yours now: updates, logs, the works. - You manage provider accounts. Two or three API keys instead of one. The consolidated invoice is genuinely gone.
- Model lag. A new provider means waiting for support (or a PR) instead of it appearing in a dropdown.
- Nobody to email. Self-hosted support is a GitHub issue tracker.
If those trade-offs read as "fine," the math is straightforward: the 5% fee disappears, the local-tier routing removes the easy majority of requests from your bill entirely, and compression shrinks what's left.
The Hybrid That Actually Makes Sense
This isn't either/or. A pattern I see working:
Coding tool โ self-hosted proxy (Lynkr)
โโ SIMPLE/MEDIUM โ local Ollama/llama.cpp (free)
โโ COMPLEX โ direct provider API keys (no fee)
โโ exotic models โ OpenRouter (5% on the long tail only)
Keep a hosted router as one backend for the long tail of models you rarely need, route the bulk directly or locally, and let the proxy's classifier decide per-request. You get the marketplace when you want it without paying the tax on your entire volume.
Lynkr is Apache-2.0, self-hosted, supports 13 providers including Ollama, llama.cpp, LM Studio, Bedrock, Azure, Databricks - and OpenRouter itself as a tier: github.com/Fast-Editor/Lynkr. Benchmarks with methodology are in the repo; run them on your own workload before believing anyone's percentages, including mine.
Comments
No comments yet. Start the discussion.