I Compared the Real Cost of Claude Code, OpenRouter, and Image APIs
An API request that looks cheap on a pricing page can become much more expensive inside a real product. The pricing page normally gives you the unit rate: Price per million input tokens Price per million output tokens Price per generated image Price per second of video Price per credit That is useful, but it is not yet a production budget. A production workflow can also include repeated context, tool results, cache writes, retries, duplicate submissions, failed media jobs, and outputs that users reject and regenerate. I wanted to see how much those factors change the estimate, so I built the same budget model for three different workloads: A normal LLM application A coding-agent workflow An image-generation feature This article explains the method. I also published an editable calculator and downloadable CSV dataset at the end. Scenario 1: A normal LLM application Consider a small application with the following usage: 100 monthly active users 20 active days per month 3 requests per user per active day 2,000 average input tokens per request 500 average output tokens per request $3 per 1M input tokens $15 per 1M output tokens 3% retry rate 15% planning buffer These are editable planning assumptions, not universal usage averages or an official provider quote. The first step is to estimate request volume: monthly requests = 100 users ร 20 active days ร 3 requests monthly requests = 6,000 The input-token cost is: 6,000 requests ร 2,000 input tokens รท 1,000,000 ร $3 input cost = $36 The output-token cost is: 6,000 requests ร 500 output tokens รท 1,000,000 ร $15 output cost = $45 That produces a listed base cost of: $36 + $45 = $81 Adding the editable 3% retry assumption: $81 ร 1.03 = $83.43 Adding a further 15% planning buffer: $83.43 ร 1.15 = $95.94 The difference is relatively small in this example, but the same multipliers become more significant at higher volume. The important point is that $95.94 is not an official price quote. It is a derived planning result based on: User-entered unit prices A specific workload An estimated retry rate An estimated budget buffer Those categories should remain separate. Scenario 2: Claude Code and other coding agents A coding-agent task is not equivalent to one chat message. One completed task may involve: Reading project instructions Loading source files Inspecting dependencies Receiving tool definitions Running shell commands Processing command output Editing files Running tests Retrying failed operations Preserving conversation history The user may only see a short final response such as: Fixed the validation bug and added a regression test. However, the agent may have processed a much larger amount of context before producing that response. For coding-agent workflows, I find this cost unit more useful: cost per completed task rather than: cost per user message A simplified task formula is: task cost = model calls per task ร estimated cost per model call ร retry multiplier Each model call may include: input tokens = instructions + conversation history + source files + tool definitions + previous tool results That is why two tasks with similarly short final answers can have very different costs. A one-file configuration fix and a repository-wide migration should not share the same expected token budget. Tool use adds context Tools do not need a separate flat fee to increase the total cost. The model may need to: Receive the tool definition Decide which tool to call Receive the tool result Read and reason about that result Generate the next action Large command outputs and large file reads can become part of later input context. For a deeper breakdown, see the Claude Code Token Cost Guide . Scenario 3: Image generation Image generation needs a different cost model. Depending on the provider and model, the billing unit may be: Per generated image Per resolution or quality tier Per megapixel Input and output image tokens Credits A combination of several units The cost of one accepted image is also different from the cost of one submitted API job. Suppose a product needs 1,000 accepted images per month. If users accept the first result every time, the application submits approximately 1,000 generation jobs. But if one accepted image requires an average of 2.4 attempts: 1,000 accepted images ร 2.4 attempts = 2,400 submitted jobs At an illustrative price of $0.04 per submitted generation: 2,400 ร $0.04 = $96 The effective cost per accepted image becomes: $96 รท 1,000 = $0.096 That is more than twice the listed one-generation price. This does not mean every provider charges for every failed request. Failed-job billing varies by provider, failure type, and processing stage. The useful distinction is between: Submitted jobs Technically successful jobs Accepted user outputs Billed jobs Those numbers should be reconciled with request IDs and the provider billing dashboard. The Image Generation API Cost Guide explains the different billing units in more detail. OpenRouter credits and AP
An API request that looks cheap on a pricing page can become much more expensive inside a real product. The pricing page normally gives you the unit rate: - Price per million input tokens - Price per million output tokens - Price per generated image - Price per second of video - Price per credit That is useful, but it is not yet a production budget. A production workflow can also include repeated context, tool results, cache writes, retries, duplicate submissions, failed media jobs, and outputs that users reject and regenerate. I wanted to see how much those factors change the estimate, so I built the same budget model for three different workloads: - A normal LLM application - A coding-agent workflow - An image-generation feature This article explains the method. I also published an editable calculator and downloadable CSV dataset at the end. Scenario 1: A normal LLM application Consider a small application with the following usage: 100 monthly active users 20 active days per month 3 requests per user per active day 2,000 average input tokens per request 500 average output tokens per request $3 per 1M input tokens $15 per 1M output tokens 3% retry rate 15% planning buffer These are editable planning assumptions, not universal usage averages or an official provider quote. The first step is to estimate request volume: monthly requests = 100 users ร 20 active days ร 3 requests monthly requests = 6,000 The input-token cost is: 6,000 requests ร 2,000 input tokens รท 1,000,000 ร $3 input cost = $36 The output-token cost is: 6,000 requests ร 500 output tokens รท 1,000,000 ร $15 output cost = $45 That produces a listed base cost of: $36 + $45 = $81 Adding the editable 3% retry assumption: $81 ร 1.03 = $83.43 Adding a further 15% planning buffer: $83.43 ร 1.15 = $95.94 The difference is relatively small in this example, but the same multipliers become more significant at higher volume. The important point is that $95.94 is not an official price quote. It is a derived planning result based on: - User-entered unit prices - A specific workload - An estimated retry rate - An estimated budget buffer Those categories should remain separate. Scenario 2: Claude Code and other coding agents A coding-agent task is not equivalent to one chat message. One completed task may involve: - Reading project instructions - Loading source files - Inspecting dependencies - Receiving tool definitions - Running shell commands - Processing command output - Editing files - Running tests - Retrying failed operations - Preserving conversation history The user may only see a short final response such as: Fixed the validation bug and added a regression test. However, the agent may have processed a much larger amount of context before producing that response. For coding-agent workflows, I find this cost unit more useful: cost per completed task rather than: cost per user message A simplified task formula is: task cost = model calls per task ร estimated cost per model call ร retry multiplier Each model call may include: input tokens = instructions + conversation history + source files + tool definitions + previous tool results That is why two tasks with similarly short final answers can have very different costs. A one-file configuration fix and a repository-wide migration should not share the same expected token budget. Tool use adds context Tools do not need a separate flat fee to increase the total cost. The model may need to: - Receive the tool definition - Decide which tool to call - Receive the tool result - Read and reason about that result - Generate the next action Large command outputs and large file reads can become part of later input context. For a deeper breakdown, see the Claude Code Token Cost Guide. Scenario 3: Image generation Image generation needs a different cost model. Depending on the provider and model, the billing unit may be: - Per generated image - Per resolution or quality tier - Per megapixel - Input and output image tokens - Credits - A combination of several units The cost of one accepted image is also different from the cost of one submitted API job. Suppose a product needs 1,000 accepted images per month. If users accept the first result every time, the application submits approximately 1,000 generation jobs. But if one accepted image requires an average of 2.4 attempts: 1,000 accepted images ร 2.4 attempts = 2,400 submitted jobs At an illustrative price of $0.04 per submitted generation: 2,400 ร $0.04 = $96 The effective cost per accepted image becomes: $96 รท 1,000 = $0.096 That is more than twice the listed one-generation price. This does not mean every provider charges for every failed request. Failed-job billing varies by provider, failure type, and processing stage. The useful distinction is between: - Submitted jobs - Technically successful jobs - Accepted user outputs - Billed jobs Those numbers should be reconciled with request IDs and the provider billing dashboard. The Image Generation API Cost Guide explains the different billing units in more detail. OpenRouter credits and API-key controls A marketplace or gateway introduces another layer of cost management. With OpenRouter, it is useful to distinguish between: - Account credits - API-key limits - Model and provider pricing - Free-model rate limits - BYOK usage - Permission errors - Insufficient-credit errors These concepts are related, but they are not interchangeable. For example, a key may have a spending limit even when the account still has credits. A request may also fail because of a permission or policy restriction rather than insufficient balance. OpenRouter currently reserves the right to expire unused credits one year after purchase. An HTTP 402 normally indicates insufficient credits, while 403 generally points to a permission, guardrail, or moderation restriction. I documented the operational checks separately in OpenRouter Credits. The numbers I would log For a text API request, I would record at least: request_id provider model input_tokens output_tokens estimated_cost status For a more useful production record, I would add: cached_input_tokens retry_count latency provider_reported_cost created_at For media jobs, I would also record: job_id duration_or_resolution submitted_at completed_at failure_stage accepted_by_user Without request-level records, it is difficult to answer basic billing questions: - Did total request volume increase? - Did the production model change? - Did the average output become longer? - Did retry volume increase? - Did users regenerate more images? - Did a timeout produce duplicate jobs? - Did provider-reported cost differ from the local estimate? A better budget model I now separate cost planning into four layers. 1. Listed unit cost This comes from provider pricing documentation. Examples include: price per 1M input tokens price per 1M output tokens price per generated image price per generated video second 2. Workload assumptions These are specific to the product: active users requests per user average input tokens average output tokens tool calls per task attempts per accepted image generated video duration 3. Operational multipliers These may include: retries duplicate submissions cache writes cache reads rejected outputs failed jobs These should be editable assumptions rather than universal facts. 4. Planning buffer A budget buffer can help when usage is uncertain, but it should not hide the underlying estimate. The calculator should show both: base API cost planned operational budget What the calculation cannot tell you A cost calculator has important limitations. It cannot determine: - Which model gives the best output quality for your product - Whether every failed job will be billed - Future model prices - Regional taxes - Currency conversion costs - Provider-specific negotiated discounts - Real retry behavior before production data exists Price is also not a measure of output quality. A calculator is a planning tool, not a replacement for provider usage records or billing dashboards. The benchmark and calculator I put the complete model into the 2026 AI API Cost Benchmark. It includes: - An interactive LLM application calculator - A coding-agent task calculator - Image-generation workload estimates - Per-second and per-job video estimates - Prototype, production, and agent-heavy presets - Retry and planning-buffer controls - Official source links - Pricing snapshot dates - A downloadable CSV dataset - Methodology and limitations The current pricing snapshot was reviewed on June 17, 2026. The calculator is free to use and does not require an API key. Pricing changes frequently, so verify current provider documentation before launching a production workload and compare the estimate with a small real-world test. What caused the largest difference between your original estimate and your actual AI bill: output tokens, repeated agent context, retries, or media regeneration? Top comments (0)
Comments
No comments yet. Start the discussion.