I Made TS Compiler Graph MCP: 10x Fewer Tokens in Claude Code
DEV Community

I Made TS Compiler Graph MCP: 10x Fewer Tokens in Claude Code

1. Preface

1.1. What @ttsc/graph Is

On the left, the agent is lost in a maze of files, chasing dashed arrows dozens deep. On the right, it's reading a single compiler-built graph of nodes and edges, with the file:line anchors it can open and check.

You're new to a TypeScript repo, so you ask the agent for a tour: what's the main runtime flow, from the public API down to the code that does the work, and what should you read first? You know how it goes. It opens a file, follows an import into another, then another, and a few dozen files later it gives you an answer.

@ttsc/graph cuts that crawl short. Over MCP, it hands your agent a graph of your TypeScript codebase that the compiler itself drew: what calls what, what depends on what, and where each piece lives. The agent answers structural questions straight from the graph instead of spelunking through files, and every claim it makes points at an exact file:line the compiler resolved. Nothing invented, just a location you can open and check for yourself.

It's the same question and the same agent in every case, and only @ttsc/graph stays flat across the repos no matter how big they get. The other three - codegraph, codebase-memory, and serena - swing all over the place, and a few even spend more than the baseline does. If you want to dig into the details, every model, the per-repo prompts alongside this shared one, and the full method are on the interactive benchmark page.

1.2. What a Code Graph Is

A city map keeps every street and building. A subway map throws most of that away and keeps the connections you need. So what is a "code graph"?

Land in a city you've never visited and you don't read every street sign in order; you glance at a subway map. It throws away the buildings and the streets and the distances, and keeps the one thing you need: what connects to what. You can read the whole thing in five seconds.

Code has the same shape. The nodes are functions, classes, and files; the edges are the calls, imports, and inheritance between them. Draw all of them and you have a code graph - an index of what calls what. The agent can query that index once instead of walking every street itself.

1.3. Index, Not Source Inlining

This is the first real fork in the road. @ttsc/graph doesn't return source bodies at all. What it returns is names, edges, signatures, and spans - and nothing more than that.

The span is the part that matters. Every range it cites is a coordinate the compiler vouched for, so if you doubt an answer, you open that exact spot and check it. What reassures me isn't that it read no files; it's that a place to verify always comes attached.

Tokens stay flat because the size of the response doesn't depend on the size of the repo. Whether the project is 100k lines or 10k, one question comes back with a similarly sized chunk of index. Why the other tools spill source instead is the autopsy in ยง2.

1.4. How It Works, in One Breath

The short version: it reads an index out of the program the TypeScript compiler has already resolved, and it feeds a single MCP tool a forced chain-of-thought so the agent doesn't wander off. That's all there is to it.

The rest of the post earns those claims. ยง2 is the autopsy of why the tools before it don't move the token bill; ยง3 is how this one does; ยง4 is how to run it yourself. First, why I built it.

2. Why I Built It

2.1. The Promise

Honestly, none of this was my idea. codegraph put a code graph in front of an agent over MCP first, and codebase-memory-mcp does much the same thing across 158 languages. serena comes at it from a different angle, wrapping a live language server (the same LSP backend your IDE uses) as a semantic toolkit rather than building a static graph.

The first time I saw the approach, I thought it was genuinely elegant - and I still do. The benchmark in this post runs all three against @ttsc/graph on two prompt families: a shared onboarding question, and codegraph's own per-repo questions. (codebase-memory-mcp doesn't ship any reproducible questions to port, and while serena does publish a reproducible evaluation, it's one you run on your own code rather than a fixed question set, so neither one gives us a comparable third family.)

And the core claim behind all of them is legitimate. The enemy is the agent's grep-find-Read crawl loop. Every "where is this?" becomes a grep, then a file open, then an import chased into another grep - dozens of round-trips that burn tokens and time. codegraph's pitch is clean: replace that loop with a single graph query, for "58% fewer tool calls" and "file reads to ~zero." codebase-memory-mcp goes further and headlines "120ร— fewer tokens," a 99.2% cut. serena asks the agent to route both its reading and its editing through symbol-aware tools it calls "much more token-efficient than your own."

It all sounded great, so I installed all three.

One thing up front. Cutting an agent's tool-call count is the easy part; cutting the tokens it spends is harder. Cutting the tokens without making the answer worse is the real problem, and it's easy to mistake the first of those for the last.

2.2. So I Tried Them

What actually happened was a lot less exciting than the pitch. zod on GPT-5.5. All three (codegraph, codebase-memory, and serena) made the agent spend more than it would have with no MCP at all - +22% to +27%. @ttsc/graph came in at 6% of the baseline. This is the starkest repo I found, but the pattern holds across the matrix.

The tokens didn't actually drop. The tool-call count did fall, but the thing you pay for - the tokens - didn't budge. On some cells it came out worse than the baseline: codegraph spent up to 47% more, codebase-memory-mcp up to 66% more, and serena up to 93% more.

Claude Code and Codex answered less accurately, not more. They kept missing what I was actually asking for. Extra MCP calls fired on their own and delayed the work I wanted done. I couldn't just ask in plain language either. Every query had to be shaped by hand. codebase-memory-mcp's sharpest tools want a Cypher query or an exact qualified_name, codegraph wants you to name the symbols in a flow, and serena wants you to activate the project and then hand it an exact name_path you don't have yet. The hand-shaping alone cost more time than the tools gave back.

So for my kind of question, all three left me worse off than before I installed them. Two things are true at once: they're the strongest tools of their kind, built by people who saw the problem before I did; and on an open question, every one made the agent spend more than nothing would have. That's when I went digging.

2.3. Bodies, and a Buried Graph

The first cause is what each tool does with what it already has.

codegraph returns whole source bodies. In its own words, the output is "byte-for-byte identical to what the Read tool returns," and it tells the agent to "treat each block as a Read you have already performed." It's the Read, done for you - which is fine when you're editing. But for a broad "how does this work?" question, the body is the token bomb. Once you're past a handful of files it hits its cap, and the overflow collapses into an "additional relevant files (not shown)" list that tells you not to Read them yourself, so you call codegraph_explore again and get more bodies back.

codebase-memory-mcp is the more interesting case, because underneath it has the right idea. It builds a real relation graph - a lot like the one I ended up with - that tracks not just where things are but how they call and depend on each other. The trouble is the surface in front of it. That capability is spread across fourteen MCP tools, and the ones that actually reach the relation data want precise input: either a Cypher query or an exact qualified_name. (There's a plain-language search too, but it doesn't touch the graph the way the query tools do.) Faced with fourteen tools and a query language, the agent mostly never reached the relation data at all. In the runs I measured it called the MCP zero times and fell back to the shell every time. It gave up and grepped. The graph was right there; the surface buried it.

serena manages to do both at once. Because it's backed by a language server its symbols come out properly resolved, but it still serves source bodies on demand, and it puts them behind around fifty tools that the agent only reaches after activating the project and waiting out a language-server cold-start. Given all that, it grepped too, with a median of zero to one MCP calls before falling back to the shell.

So one hands back too much, another keeps the right thing behind a door the agent can't find, and the third manages to do both. None of them moves the token bill.

2.4. The Instructions

The second cause is the instructions, and here the three tools scatter in every direction.

codegraph forces its tool, and it forces it hard. The MCP instructions tell the agent to "use it instead of reading files," to "call it before you Read," to "don't grep or Read first," and to reach for it on "almost any question." So it fires even when the graph isn't the answer - on a config file, a small edit, or a question it can't answer at all - and those wasted calls get in the way of the real work. The README is candid about why: the tool only helps when you query it directly and is pure overhead otherwise, so the instructions lean hard on the agent to keep it from becoming overhead.

codebase-memory-mcp does the opposite. Its MCP initialize sends no instructions at all. What guidance it does have lives in an install-time skill file that the agent never sees if you just wire up the server, and its auto-indexing is off by default. So you get fourteen tools over the wire with almost nothing telling the agent which one to reach for or when - and, as we just saw, it mostly didn't and fell back to the shell.

serena over-directs too, and more literally than the others. In the Claude Code setup its instructions forbid the agent from using its own Read and Edit on code files, swap in a wholesale replacement system prompt, and warn that the built-in editor "will deny such edits." It gives the agent its own read, grep, and edit tools to use in their place - swapping out the agent's hands for its own. And even after all that, the agent still reached for its own grep.

@ttsc/graph puts a hammer in your hand, and you're still holding the pen and the wrench and everything else. serena takes the hand off and bolts a hammer where it used to be: great for driving nails, but now that's all the arm can do.

So one under-directs, the other two over-direct, and none of them lands on what you actually want: for the agent to use the graph when it helps and stop when it doesn't. Forcing a tool isn't the same as getting it adopted, and a big surface area isn't the same as capability. A tool the agent won't reach for is worse than no tool at all, because you're still paying for its description on every single turn.

You can read the effort in the prompts themselves. serena's Claude Code configuration doesn't only tell the agent to prefer its tools; it enumerates specific rationalizations and rules each one out, down to "I already know the path" and "one Read call is faster than three Serena calls." Every line in that list is a rationalization the model actually produced, caught and forbidden one at a time. It's real craft, and there's a lot of it: serena swaps in about a hundred and fifty lines of replacement system prompt before the first question, and codegraph ships around a hundred lines of instructions plus a skill file injected at install time. Almost all of it exists to secure one thing: that the agent actually calls the tool. On my questions, none of it worked. You can't write your way to adoption.

To their credit, none of the three hide their limits. codegraph's README says its token savings depend on scale and are small on a normal codebase. codebase-memory-mcp reports its biggest numbers on a handful of structural queries rather than open-ended ones. serena's own docs admit its symbolic tools win on large edits but lose to a plain text edit on small ones. All three tell you plainly that the wins were measured on targeted scenarios.

And that's the point. Those limits only get bigger the more general your usage is, and general usage is where I spend my days.

But none of it means the approach was wrong. The idea is a lovely one; materializing it took a lot of trial and error, and the three of them did that work in the open, wall by wall. I came to the same problem holding two things they didn't: a compiler toolchain that hands me a resolved graph for free, and a way to make the agent comply with a typed contract. So I took what they proved was worth wanting and built it for the case they weren't aimed at: the open-ended question, with the tokens down and the answer no worse.

3. How It Works

@ttsc/graph is built to fix those problems one at a time. There are four of them, and the table below lines each one up with its fix.

Pain (ยง2) Antidote (ยง3)
2.4 - over-forced (up to a hard ban), or no guidance at all 3.1 - guides without forcing (escape + stop)
2.3 - a real graph buried under 14 (or ~50) tools 3.2 - one tool, asked in plain language
2.3 - source bodies blow up tokens 3.3 - returns index on

Comments

No comments yet. Start the discussion.