claude-codeapi-costscost-optimizationrelayplane

Claude Code API Costs: Where the Tokens Go and How to Cut Them

Matt Turley··7 min read

Claude Code makes a lot of API calls. When you run it on a real codebase, it reads files, searches for patterns, checks git history, and runs tool calls in tight loops. Each of those operations hits the Anthropic API. The model doing the heavy lifting is often Claude Opus or Sonnet, which means costs add up quickly on a session that runs for more than a few minutes.

Most developers do not notice the cost until they check their Anthropic dashboard at the end of the week. By then it is too late to know which session caused the spike or which type of operation was the expensive one. Here is a breakdown of where Claude Code API costs come from and what you can do about it.


Where the Tokens Go

Claude Code costs come from three main sources, and they are not equal.

Context window size. Every API call includes the full conversation context: your system prompt, previous messages, tool call results, and the current request. As a session gets longer, the input token count grows with every call. A long debugging session can push 50,000+ input tokens per request near the end of the session. At Opus pricing, that adds up fast.

File reads and tool calls. When Claude Code reads a large file or searches a codebase, the results come back as tool output that gets added to the context. Reading a 300-line file and getting back a grep result with 40 matches is not free. Each of those results is input tokens on the next call.

Model selection. Claude Code defaults to the most capable model available. For many operations, that is overkill. Checking whether a file exists or reading a package.json does not need Opus. But if every call in a session goes to Opus, you pay Opus prices for all of them.


Tracking Cost Per Session

Anthropic's usage page shows aggregate spend by day. It does not break down by session, project, or operation type. If you are running Claude Code across multiple projects and want to know which one costs more, you are guessing without instrumentation.

The practical solution is to route Claude Code through a local proxy that tracks cost per request. RelayPlane does this with no configuration changes to Claude Code itself:

npm install -g @relayplane/proxy
relayplane init
relayplane start

Then set one environment variable before launching Claude Code:

export ANTHROPIC_BASE_URL=http://localhost:4100

Every call Claude Code makes now passes through the local proxy. Each response comes back with cost metadata: input tokens, output tokens, cache read tokens (at the lower cache rate), cache write tokens, and total computed cost. You can see exactly what each operation cost.

The proxy writes a local log you can query, so you can look back at a session and see which calls were expensive. A file read that returned 10,000 tokens of context shows up as a line item, not buried in an aggregate number.


Prompt Caching and What It Actually Saves

Anthropic charges different rates for prompt cache reads versus cache writes versus uncached input tokens. If your system prompt is long and stable across calls, caching it saves money on every subsequent request in a session. Claude Code does use prompt caching for the system prompt.

The caveat is that cache writes cost more than regular input tokens. The first call in a session that caches the system prompt is more expensive, not less. The savings materialize on subsequent calls. For short sessions with one or two turns, caching may not save anything.

When you track cost through RelayPlane, cache read tokens and cache write tokens are reported separately from regular input tokens. You can see whether caching is actually saving you money in a given session pattern, rather than assuming it is.


Routing Cheaper Models for Simpler Operations

The highest-leverage cost optimization for Claude Code usage is model routing: send simple operations to a cheaper model and reserve the expensive one for tasks that need it.

RelayPlane supports complexity-based routing. You configure which model handles "simple" requests and which handles "complex" ones. The proxy inspects each request and routes accordingly. For Claude Code, this means file reads, simple grep results, and short tool call responses can go to Haiku or Sonnet, while multi-step reasoning and code generation still use Opus.

The routing config is explicit. You decide the rules; the proxy applies them consistently. A typical config that routes simple requests to Haiku cuts total session cost by 40-60% with no change to output quality on the tasks that matter.

# relayplane.config.json (simplified)
{
  "routing": {
    "simple": "claude-haiku-4-5-20251001",
    "complex": "claude-opus-4-6"
  },
  "budget": {
    "daily": 10.00
  }
}

Budget Enforcement

The second useful feature for Claude Code cost control is a hard budget limit. If you have a runaway session or an agent loop that calls the API in a tight cycle, costs can spike before you notice. RelayPlane can block requests when a daily or per-session budget is exceeded, rather than letting the spend continue until you check the dashboard.

The enforcement behavior is configurable: block, warn, or downgrade to a cheaper model when the budget threshold is hit. For personal development use, a $10 daily limit with downgrade behavior is a practical starting point. For team use, per-user or per-project budgets can be set via the RelayPlane API.


What This Looks Like in Practice

The workflow for instrumenting Claude Code with cost tracking is three commands:

npm install -g @relayplane/proxy
relayplane init    # set your Anthropic API key
relayplane start   # proxy runs on localhost:4100

Then in your shell profile or before launching Claude Code:

export ANTHROPIC_BASE_URL=http://localhost:4100

From that point on, every Claude Code session goes through the proxy. Costs are logged per request. You can check the session log after a working session and see exactly where the tokens went. Most developers find one or two patterns that account for the majority of cost: usually a very large file being read repeatedly, or a long session that kept growing the context window without being reset.

Fixing those patterns, combined with routing simpler operations to a cheaper model, tends to cut Claude Code API spend by 50-70% without any change to what you can actually accomplish in a session.

npm install @relayplane/proxy

Start there, then look at your first session log.


RelayPlane is open source: github.com/RelayPlane/proxy. Package: @relayplane/proxy on npm. Supports Anthropic, OpenAI, and 9 other providers. Last verified: 2026-03-12.