ContextLens — py-spy/pprof but for what's inside your LLM prompt
DEV Community Grade 10 5d ago

ContextLens — py-spy/pprof but for what's inside your LLM prompt

In multi-turn agent loops, the full context re-sends on every API call. A tool result added at turn 3 gets billed again at turns 4, 5, 6, 7... forever. Most of it is never read again. Standard observability tools tell you the total token count. They never tell you what's in there or how much of it is waste . That's what ContextLens fixes. What it does ContextLens is a diagnostic profiler for LLM agent context windows. It: Decomposes the context window into regions: system prompt, tool schemas, tool results, retrieved chunks, user messages, assistant messages Tracks which blocks get re-billed across turns using SHA-256 content hashing Runs 5 waste detectors and ranks findings by dollar cost Prints a concrete one-line fix for each finding Renders an interactive D3 treemap report as a self-contained HTML file No API key required. Works offline on saved traces. The five detectors Detector What it finds Duplicate Same block re-sent verbatim across multiple turns Near-Duplicate >85% Jaccard similarity between distinct blocks Stale Tool Result Tool output never referenced by a later assistant message Unused Tool Schema Tool defined every turn but never called Redundant Retrieval Retrieved chunk with <15% overlap with model output ---Run the built-in demo (simulates a 30-turn agent loop, no API key needed): python -c "import contextlens; contextlens.demo()" python examples/demo.py Live capture — Anthropic import anthropic import contextlens as cl client = anthropic.Anthropic() with cl.capture_anthropic(client, model="claude-3-5-sonnet-20241022") as collector: for turn in range(20): client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=1024, system="You are a helpful assistant.", messages=build_messages(turn), ) report = cl.analyze_trace(collector.build_trace()) print(f"Recoverable waste: {report.recoverable_tokens:,} tokens (${report.recoverable_cost_usd:.4f})") Live capture — OpenAI import openai import contextlens as cl client = openai.OpenAI() with cl.capture_openai(client, model="gpt-4o") as collector: for turn in range(20): client.chat.completions.create(model="gpt-4o", messages=build_messages(turn)) report = cl.analyze_trace(collector.build_trace()) Analyze a saved trace report = cl.analyze_file("trace.json") html = cl.render_html_report(report) open("report.html", "w").write(html) Example terminal output +---------------------------------------------------------------------+ | ContextLens | Run demo-001| | Model: claude-3-5-sonnet-20241022 | Provider: anthropic | Turns: 30 | +---------------------------------------------------------------------+ Context Composition by Region RegionTokensCost (USD) Share assistant_message 11,490$0.0345###....... 25.5% tool_result 10,333$0.0310##........ 22.9% tool_schema9,450$0.0284##........ 21.0% retrieved_content5,805$0.0174#......... 12.9% user_message 4,740$0.0142#......... 10.5% system 3,240$0.0097 #.........7.2% TOTAL 45,058$0.1352 Re-billing: 43,185 tokens (95.8%) re-billing waste -> $0.1296 recoverable Top Waste Findings # TypeSev. Wasted TokensCostFix 1 duplicate medium 7,084 $0.0213 Cache or externalize... 2 redundant_ret medium 5,805 $0.0174 Use a re-ranker... 3 unused_schema low3,150 $0.0095 Remove send_email... Try the live demo No install, no API key: https://huggingface.co/spaces/Harshal0610/contextlens Links GitHub: https://github.com/HarshalSant/contextlens Install: pip install contextlens-profiler License: MIT Feedback welcome — especially from anyone running multi-turn agent loops at scale. What waste patterns do you run into most? Quickstart bash pip install contextlens-profiler

In multi-turn agent loops, the full context re-sends on every API call. A tool result added at turn 3 gets billed again at turns 4, 5, 6, 7... forever. Most of it is never read again. Standard observability tools tell you the total token count. They never tell you what's in there or how much of it is waste. That's what ContextLens fixes. What it does ContextLens is a diagnostic profiler for LLM agent context windows. It: - Decomposes the context window into regions: system prompt, tool schemas, tool results, retrieved chunks, user messages, assistant messages - Tracks which blocks get re-billed across turns using SHA-256 content hashing - Runs 5 waste detectors and ranks findings by dollar cost - Prints a concrete one-line fix for each finding - Renders an interactive D3 treemap report as a self-contained HTML file No API key required. Works offline on saved traces. The five detectors | Detector | What it finds | |---|---| | Duplicate | Same block re-sent verbatim across multiple turns | | Near-Duplicate | >85% Jaccard similarity between distinct blocks | | Stale Tool Result | Tool output never referenced by a later assistant message | | Unused Tool Schema | Tool defined every turn but never called | | Redundant Retrieval | Retrieved chunk with $0.1296 recoverable Top Waste Findings # Type Sev. Wasted Tokens Cost Fix 1 duplicate medium 7,084 $0.0213 Cache or externalize... 2 redundant_ret medium 5,805 $0.0174 Use a re-ranker... 3 unused_schema low 3,150 $0.0095 Remove send_email... Try the live demo No install, no API key: https://huggingface.co/spaces/Harshal0610/contextlens Links GitHub: https://github.com/HarshalSant/contextlens Install: pip install contextlens-profiler License: MIT Feedback welcome — especially from anyone running multi-turn agent loops at scale. What waste patterns do you run into most? Quickstart bash pip install contextlens-profiler Top comments (0)

Comments

0
Huh, the stale tool result detector is really smart because it catches that subtle pattern where a tool output gets appended but the model never actually refers back to it, yet you keep paying for those tokens across every subsequent turn. Does ContextLens also account for cases where the tool result might influence the model's behavior implicitly even if it's not explicitly referenced in the assistant message?