DEV Community 1h ago

A cheap, persistent memory that learns your repo so your agent stops re-reading it

How it works

Every Claude Code session re-discovers your codebase from scratch - re-reading files, re-grepping, re-paying for the same context. live-memory runs a separate, cheaper large-context model in a long-running MCP server whose only job is to accumulate knowledge of your repository over time. Your main agent asks it questions through one read-only tool, ask_live_memory, instead of reloading the repo itself.

Passive learning

PostToolUse and FileChanged hooks tee the content of every file your agent reads or edits into the memory, so it learns the code as a side effect of real work - without paying to re-read it - and stays current as the repo changes: an observed edit is authoritative and applied immediately, while an out-of-band change (external editor, git checkout) is flagged stale and re-read on next use. ask_live_memory is the active fallback for anything the passive layer hasn't seen yet.

Architecture

The server is a singleton that serves every Claude Code session - many agents, concurrently and over time - keying state per workspace (cwd). Its window is append-only between compactions; compaction is a neutral, query-agnostic summarization into a knowledge ledger (a high/low-watermark keeps it rare and batched) - never front-truncation - so older knowledge is distilled, not dropped. A background keep-warm loop holds the KV/prompt cache hot to cut latency and cost.

Configuration

It's provider-pluggable (Anthropic Messages with cache_control, or any OpenAI-compatible endpoint - DeepSeek, gateways) and zero-config: with no API key but a Claude subscription, it uses the subscription OAuth token (auto-refreshed) on Haiku. A two-tier timeout gives the model a budget and returns a best-effort answer before the hard MCP timeout. Everything it can touch is read-only and path-jailed. Human-facing status is a /live-memory-stats slash command, kept off the agent's tool surface.

Performance

In a claude -p A/B on a real repo, on understanding-heavy work the building (premium) model offloaded ~93% of its codebase-reading tokens to live-memory, cost ~61% less per task and finished ~22% faster - and its cost got more predictable (the without-memory arm occasionally spiraled into long re-reading loops).

Honest scope: pure edit/execution work is roughly break-even, and on realistic hybrid (understand-then-edit) tasks the all-in saving is a smaller ~11% per task - this makes understanding cheaper, not typing.

Choosing a companion model

The premium-model saving above is what you keep regardless; the only cost added back is running the small memory model, and that model is pluggable. Default is Haiku, but on our accuracy set (15 Q × 3 reps) deepseek-v4-flash matched - and slightly edged - Haiku (98% vs 91% correct, fewer hallucinations, both perfect on the negative traps) at ~8× lower token price - or point it at a local model for ≈ free. The cheaper the companion, the closer the all-in cost gets to the full premium saving: −25% all-in on Haiku → −57% on deepseek-v4-flash, versus the −61% building-model number.

Installation

Live Memory is an HTTP MCP server you run once, plus a plugin that wires it in - start the server first:

# zero-config on a Claude subscription → Haiku
# (or: cd ../server && pip install -e . && python -m live_memory)
git clone https://github.com/shofer-dev/claude-code-live-memory
cd claude-code-live-memory/deploy && ./install-service.sh

Then, inside a Claude Code session:

/plugin marketplace add shofer-dev/claude-code-live-memory
/plugin install live-memory@shofer-live-memory

Ask your agent a whole-repo question - it'll call ask_live_memory instead of reading files.

Standalone Claude Code plugin - Python, Apache-2.0. Source: github.com/shofer-dev/claude-code-live-memory

Read on DEV Community ↗ ← Back to News