How to Reduce Claude Code API Costs

Claude Code routes every single request through Opus by default. That is a 10x price multiplier on tasks Haiku handles perfectly well. Here is how to fix it in 30 seconds, with zero code changes.

Why Claude Code API Costs Spike Unexpectedly

Claude Code is remarkably capable, but that capability comes at a price. When you open a project, Claude Code fires off dozens of API calls before you type a single prompt: file indexing, context loading, tool discovery, status checks. Every one of those calls goes to your configured model, which is almost always claude-opus-4-5 or claude-sonnet-4-5.

The pricing gap is massive

Claude Opus input tokens cost roughly 15x more than Haiku. A busy coding day easily burns 2-4 million tokens. Routed to Haiku for simple tasks, those same tokens cost pennies instead of dollars.

Typical Claude Code session breakdown

Request type
Share of calls
Best model
File reads / grep
~30%
Haiku
Status / ping
~15%
Haiku
Simple edits
~20%
Sonnet
Refactors
~15%
Sonnet
Architecture / reasoning
~20%
Opus

About 65% of requests do not need Opus. If every call hits Opus anyway, you are overpaying by a factor of 3-10x on most of your traffic.

How RelayPlane Intercepts Every Request

RelayPlane is a local proxy that runs on your machine. It listens on http://localhost:4100 and forwards requests to the Anthropic API, rewriting the model field based on complexity analysis before the request leaves your machine.

Request flow

Claude Codesends request tolocalhost:4100
RelayPlane inspects prompt complexity
routes tohaikuorsonnetoropus
proxied toapi.anthropic.comwith your API key

Because Claude Code reads the ANTHROPIC_BASE_URL environment variable, you redirect all traffic with a single export. Your API key never leaves your machine and no prompts are logged on any server.

Setting Up in 30 Seconds

Three terminal commands. No config files, no accounts, no API key changes.

1. Install the proxy

$npm install -g @relayplane/proxy

2. Start the local proxy

$relayplane start

3. Point Claude Code at the proxy

$export ANTHROPIC_BASE_URL=http://localhost:4100

4. Launch Claude Code normally

$claude

Persist across sessions: Add the export to your ~/.zshrc or ~/.bashrc so every new terminal session points at the proxy automatically.

Reading the Cost Dashboard

Once the proxy is running, open the local dashboard at http://localhost:4100/dashboard. It shows every request in real time: the original model requested, the model actually used, token counts, and the cost difference.

Dashboard columns

Model requestedWhat Claude Code asked for, e.g. claude-opus-4-5
Model usedWhat RelayPlane actually routed to (often Haiku or Sonnet)
Tokens in / outInput and output token counts for the request
Cost savedDollar difference between original model price and routed model price
Cumulative savingsRunning total since the proxy started this session

The dashboard is served from SQLite on disk. Nothing leaves your machine. You can export the full request log as CSV for expense reporting or to share with your team.

Routing Haiku vs Opus Automatically

RelayPlane uses prompt complexity signals to decide which model to use. You can also set routing rules manually in ~/.relayplane/config.json.

Auto (default)

Complexity-based routing. Short reads and pings go to Haiku, complex reasoning goes to Opus.

Cost priority

Maximizes savings. Routes to Haiku unless the prompt explicitly requires deeper reasoning.

Quality priority

Prefers Sonnet or Opus. Downgrades only obvious utility calls like file stat checks.

Example config snippet

{
  "routing": {
    "mode": "auto",
    "rules": [
      { "match": "file_read",  "model": "claude-haiku-4-5" },
      { "match": "simple_edit","model": "claude-sonnet-4-5" },
      { "match": "default",    "model": "claude-opus-4-5"  }
    ]
  }
}

Real Cost Example: Before vs After

Here is a typical full-day Claude Code session with a medium-sized TypeScript project, about 200 requests and 3 million tokens processed.

Before RelayPlane

200 requestsall Opus
3M input tokens$45.00
600K output tokens$22.50
Daily total$67.50
~$1,350/month

After RelayPlane (auto mode)

130 req via Haiku$0.39
50 req via Sonnet$6.75
20 req via Opus$9.00
Daily total$16.14
~$323/month

76% reduction in this example

Saving $1,027/month for one developer. Team plans multiply this.

What You Do Not Lose

Complex reasoning and architecture tasks still route to Opus automatically
If a cheaper model fails, the proxy retries with a more capable model
Your API key is used directly with Anthropic, no third-party billing
Zero code changes required in your project files
Proxy falls back to direct API calls if localhost:4100 is unreachable
MIT licensed, runs entirely on your machine, works offline

Start saving on Claude Code costs today

Free tier. 30-second setup. No account required to start.