How to Reduce Claude Code API Costs
Claude Code routes every single request through Opus by default. That is a 10x price multiplier on tasks Haiku handles perfectly well. Here is how to fix it in 30 seconds, with zero code changes.
Why Claude Code API Costs Spike Unexpectedly
Claude Code is remarkably capable, but that capability comes at a price. When you open a project, Claude Code fires off dozens of API calls before you type a single prompt: file indexing, context loading, tool discovery, status checks. Every one of those calls goes to your configured model, which is almost always claude-opus-4-5 or claude-sonnet-4-5.
The pricing gap is massive
Claude Opus input tokens cost roughly 15x more than Haiku. A busy coding day easily burns 2-4 million tokens. Routed to Haiku for simple tasks, those same tokens cost pennies instead of dollars.
Typical Claude Code session breakdown
About 65% of requests do not need Opus. If every call hits Opus anyway, you are overpaying by a factor of 3-10x on most of your traffic.
How RelayPlane Intercepts Every Request
RelayPlane is a local proxy that runs on your machine. It listens on http://localhost:4100 and forwards requests to the Anthropic API, rewriting the model field based on complexity analysis before the request leaves your machine.
Request flow
Because Claude Code reads the ANTHROPIC_BASE_URL environment variable, you redirect all traffic with a single export. Your API key never leaves your machine and no prompts are logged on any server.
Setting Up in 30 Seconds
Three terminal commands. No config files, no accounts, no API key changes.
1. Install the proxy
npm install -g @relayplane/proxy2. Start the local proxy
relayplane start3. Point Claude Code at the proxy
export ANTHROPIC_BASE_URL=http://localhost:41004. Launch Claude Code normally
claudePersist across sessions: Add the export to your ~/.zshrc or ~/.bashrc so every new terminal session points at the proxy automatically.
Reading the Cost Dashboard
Once the proxy is running, open the local dashboard at http://localhost:4100/dashboard. It shows every request in real time: the original model requested, the model actually used, token counts, and the cost difference.
Dashboard columns
The dashboard is served from SQLite on disk. Nothing leaves your machine. You can export the full request log as CSV for expense reporting or to share with your team.
Routing Haiku vs Opus Automatically
RelayPlane uses prompt complexity signals to decide which model to use. You can also set routing rules manually in ~/.relayplane/config.json.
Auto (default)
Complexity-based routing. Short reads and pings go to Haiku, complex reasoning goes to Opus.
Cost priority
Maximizes savings. Routes to Haiku unless the prompt explicitly requires deeper reasoning.
Quality priority
Prefers Sonnet or Opus. Downgrades only obvious utility calls like file stat checks.
Example config snippet
{
"routing": {
"mode": "auto",
"rules": [
{ "match": "file_read", "model": "claude-haiku-4-5" },
{ "match": "simple_edit","model": "claude-sonnet-4-5" },
{ "match": "default", "model": "claude-opus-4-5" }
]
}
}Real Cost Example: Before vs After
Here is a typical full-day Claude Code session with a medium-sized TypeScript project, about 200 requests and 3 million tokens processed.
Before RelayPlane
After RelayPlane (auto mode)
76% reduction in this example
Saving $1,027/month for one developer. Team plans multiply this.
What You Do Not Lose
Start saving on Claude Code costs today
Free tier. 30-second setup. No account required to start.