DEV Community 1h ago

The 5% Router Tax: What Hosted LLM Gateways Charge For (and How to Self-Host It)

What the Fee Buys

Unified API across providers - one format in, translated per-provider out.
Failover - a provider 500s, your request retries elsewhere.
Model marketplace - new models available the day they launch.
Consolidated billing - one invoice instead of six provider accounts.
(Sometimes) smart routing - OpenRouter's auto router picks a model per-request.

Items 1, 2, and 5 are software. Items 3 and 4 are genuinely hard to self-host - if you want day-one access to every new model with zero account setup, the marketplace earns its fee. But most coding workloads use a handful of models, not four hundred.

The Parts a Hosted Router Structurally Can't Give You

Local models as a tier. No hosted router will route your easy requests to the Ollama instance on your own machine - free, private, zero latency to first byte on cached weights. For coding traffic, where (in my instrumented sessions) 70–90% of requests are simple enough for a good local model, this is the single biggest cost lever, and it's only available to something running on your side of the wire.
Your data staying home. Self-hosted means prompts, code, and keys never transit a third party. For anyone with a compliance requirement - or code they'd rather not ship to a router's logs - this isn't a preference, it's a prerequisite.
Token optimization before the bill. A hosted router bills you for the tokens you send it - it has no incentive to shrink them. A self-hosted proxy can strip unusable tool schemas (measured: −53% on tool-heavy requests) and compress JSON tool results (measured: 3,458 → 427 tokens on a grep result) before any provider bills you. That's not a routing saving; it stacks on top of routing.
No availability dependency. Hosted routers go down (OpenRouter's outages have their own HN threads) and offer no SLA at consumer tiers. A local proxy fails independently of anyone's status page.

What Self-Hosting Costs You

Honesty cuts both ways:

You run a process. npm install -g lynkr && lynkr init && lynkr start - but it's yours now: updates, logs, the works.
You manage provider accounts. Two or three API keys instead of one. The consolidated invoice is genuinely gone.
Model lag. A new provider means waiting for support (or a PR) instead of it appearing in a dropdown.
Nobody to email. Self-hosted support is a GitHub issue tracker.

If those trade-offs read as "fine," the math is straightforward: the 5% fee disappears, the local-tier routing removes the easy majority of requests from your bill entirely, and compression shrinks what's left.

The Hybrid That Actually Makes Sense

This isn't either/or. A pattern I see working:

Coding tool → self-hosted proxy (Lynkr)
├─ SIMPLE/MEDIUM → local Ollama/llama.cpp (free)
├─ COMPLEX → direct provider API keys (no fee)
└─ exotic models → OpenRouter (5% on the long tail only)

Keep a hosted router as one backend for the long tail of models you rarely need, route the bulk directly or locally, and let the proxy's classifier decide per-request. You get the marketplace when you want it without paying the tax on your entire volume.

Lynkr is Apache-2.0, self-hosted, supports 13 providers including Ollama, llama.cpp, LM Studio, Bedrock, Azure, Databricks - and OpenRouter itself as a tier: github.com/Fast-Editor/Lynkr. Benchmarks with methodology are in the repo; run them on your own workload before believing anyone's percentages, including mine.

Read on DEV Community ↗ ← Back to News

The 5% Router Tax: What Hosted LLM Gateways Charge For (and How to Self-Host It)

What the Fee Buys

The Parts a Hosted Router Structurally Can't Give You

What Self-Hosting Costs You

The Hybrid That Actually Makes Sense

Comments