How to Cut Claude Code API Costs 70% in 5 Minutes with RelayPlane
I spent $340 on Anthropic API calls last month running Claude Code through OpenClaw. Most of that was Opus tokens burning on tasks that didn't need Opus. File reads, config checks, formatting, git status. All routed to the most expensive model because that's what was configured.
The fix took me five minutes. I dropped a local proxy between OpenClaw and Anthropic's API, and now simple tasks hit Sonnet or Haiku instead of Opus. My agents didn't notice. My bill did.
Here's the exact setup.
The 5-Minute Walkthrough
Three commands, one environment variable. That's it.
Step 1: Install the proxy
npm install -g @relayplane/proxyYou need Node.js 18+. If you're running Claude Code, you already have it.
Step 2: Initialize and start
relayplane init
relayplane startrelayplane init creates a config file with complexity-based routing out of the box. It asks which providers you use and sets up sensible defaults. The proxy starts on port 4100.
Step 3: Point OpenClaw at it
export ANTHROPIC_BASE_URL=http://localhost:4100That's the whole trick. OpenClaw thinks it's talking directly to Anthropic. The proxy intercepts every request, classifies the task complexity, and routes it to the cheapest model that can handle it. Complex architecture decisions still go to Opus. Your “read this file and tell me what changed” requests go to Haiku.
Restart your OpenClaw agents (or just start a new session), and you're running through the proxy.
Verify it's working:
relayplane statsYou'll see a breakdown of which models handled which requests and what you're saving. There's also a dashboard at http://localhost:4100 with real-time cost tracking.
Done. That's five minutes.
Before vs. After: The Math
Let's run the numbers with Anthropic's actual current pricing.
Current Claude API pricing (per million tokens):
| Model | Input | Output |
|---|---|---|
| Claude Opus 4.6 | $5.00 | $25.00 |
| Claude Sonnet 4.6 | $3.00 | $15.00 |
| Claude Haiku 4.5 | $1.00 | $5.00 |
Now here's the thing most people don't realize about their Claude Code usage: the majority of requests are simple. I tracked mine for a week. Roughly 55-60% of all API calls were things like reading files, checking status, formatting output, writing boilerplate, and running simple tool calls. Another 25% were medium-complexity tasks like code review and refactoring. Only about 15% actually needed the full reasoning power of Opus.
Here's what that looks like on a $200/month bill:
Without proxy (everything hits Opus 4.6):
| Task Type | % of Tokens | Monthly Cost |
|---|---|---|
| Simple (file reads, formatting, status) | ~60% | $120 |
| Medium (code review, refactoring) | ~25% | $50 |
| Complex (architecture, hard debugging) | ~15% | $30 |
| Total | $200 |
With RelayPlane routing:
| Task Type | Routed To | Monthly Cost |
|---|---|---|
| Simple | Haiku 4.5 ($1/$5) | $12 |
| Medium | Sonnet 4.6 ($3/$15) | $30 |
| Complex | Opus 4.6 ($5/$25) | $30 |
| Total | $72 |
That's a 64% reduction. And if you're one of those folks still running legacy Claude Opus 4 at $15/$75 per MTok, the savings are even more dramatic. Same routing logic on those prices pushes you past 70% easily because the gap between Opus 4 and Haiku is massive.
The key insight: you're not degrading quality on the tasks that matter. Complex reasoning still gets the best model. You're just stopping the waste on tasks that never needed it.
How the Routing Actually Works
RelayPlane doesn't randomly guess which model to use. The default config uses complexity-based classification:
routing:
mode: complexity
models:
simple: anthropic/claude-haiku-4-5
medium: anthropic/claude-sonnet-4-6
complex: anthropic/claude-opus-4-6
cascade:
enabled: true
strategy: costThe proxy inspects each request, looks at the prompt complexity, tool usage, context length, and routes accordingly. If the cheaper model fails or returns a low-confidence response, cascade mode retries with the next tier up.
You can tune these mappings after a few days by checking relayplane stats and adjusting. Maybe your codebase needs Sonnet for tasks the proxy classified as “simple.” Just edit the config. The defaults are a solid starting point though.
Bonus: Ollama Local Fallback for Dev/Testing
If you're running Ollama locally with something like llama3 or codellama, you can add it as a fallback provider:
routing:
mode: complexity
models:
simple: ollama/llama3
medium: anthropic/claude-sonnet-4-6
complex: anthropic/claude-opus-4-6Now your simple tasks don't even hit a paid API. They run against your local Ollama instance. Zero cost. This is incredible for development and testing, where you're iterating fast and burning through tokens on stuff that doesn't need cloud-grade intelligence.
The setup is dead simple. If Ollama is running on your machine (ollama serve), RelayPlane auto-detects it. No extra config needed beyond specifying ollama/model-name in your routing rules.
For my dev workflow, I route simple and medium tasks to local models and only send complex tasks to Anthropic. My API bill during development dropped to almost nothing.
What About Privacy?
Quick note since this comes up every time: RelayPlane runs entirely on your machine. Your prompts and API keys never leave localhost. There's no cloud relay, no data collection. The proxy is a local Node.js process and the dashboard is a local web server.
Telemetry is on by default. It only sends anonymous aggregate stats (token counts, latency). No prompt content, ever. Check the status with relayplane telemetry and disable it with relayplane telemetry off.
The One-Liner Summary
You're probably overpaying for Claude Code by 60-70% right now because every request, no matter how trivial, hits your most expensive model. A local proxy fixes that in five minutes with zero changes to your agents.
npm install -g @relayplane/proxy && relayplane init && relayplane startThen set ANTHROPIC_BASE_URL=http://localhost:4100 and you're done.
Check the RelayPlane docs for advanced routing configs, multi-provider setups, and budget controls. The free tier has no request limits.
Your agents won't know the difference. Your API bill will.