How to Route Claude API Requests Through a Local Proxy (With Automatic Cost Savings)
If you're running Claude in production, you're probably sending requests straight to api.anthropic.com. That works, but it means every request hits the same model at the same price, with no visibility into what you're spending per task.
A local proxy sits between your app and Anthropic's API. It intercepts requests, routes them to the right model based on complexity, tracks costs per request, and gives you a dashboard to see where your money goes. No code changes required on your end.
Setting ANTHROPIC_BASE_URL
Claude's SDKs and tools all respect the ANTHROPIC_BASE_URL environment variable. Point it at your proxy and every request flows through it automatically.
export ANTHROPIC_BASE_URL=http://localhost:4100That's it. Your existing code keeps calling anthropic.messages.create() exactly the same way. The SDK sends the request to your proxy instead of Anthropic directly.
This works with:
- Claude Code CLI (reads ANTHROPIC_BASE_URL from your shell)
- Python SDK (
anthropic.Anthropic()picks it up automatically) - TypeScript SDK (same behavior)
- Any OpenAI-compatible client (the proxy speaks both protocols)
What the Proxy Actually Does
A good proxy does three things you can't do with direct API calls:
1. Smart model routing. Not every request needs Opus. A proxy classifies incoming requests by complexity and routes simple tasks to Haiku, moderate tasks to Sonnet, and only sends genuinely complex work to Opus. You get the same quality output at a fraction of the cost.
2. Per-request cost tracking. Instead of one big Anthropic invoice at month end, you see exactly what each request cost, which model handled it, and how many tokens it consumed. This is the difference between "AI costs $2,000/month" and "the summarization pipeline costs $0.003 per document."
3. Budget enforcement. Set daily or weekly spending limits. The proxy blocks requests that would exceed your budget instead of letting costs run away while you sleep.
Setting Up RelayPlane in 3 Lines
RelayPlane is an open-source proxy built for this exact use case. It's a single npm package, no Docker, no infrastructure.
npm install -g @relayplane/proxy
relayplane init
export ANTHROPIC_API_KEY=sk-ant-...
relayplane startThe proxy starts on port 4100. Set ANTHROPIC_BASE_URL=http://localhost:4100 and you're routing through it.
The dashboard at http://localhost:4100 shows real-time cost tracking, model usage breakdown, and request history. No separate analytics tool needed.
Real Cost Savings: Haiku vs Opus Routing
Here's what this looks like with real numbers. Say you're running an AI agent that makes 1,000 requests per day. Without routing, everything goes to Claude Sonnet:
| Scenario | Model | Cost per 1K requests (est.) |
|---|---|---|
| No proxy | All Sonnet | ~$15 |
| With routing | 70% Haiku, 25% Sonnet, 5% Opus | ~$4.50 |
That's roughly a 70% cost reduction without changing your prompts, your code, or your output quality. The simple requests (status checks, formatting, classification) go to Haiku where they belong. The hard requests (code generation, analysis, reasoning) still get Opus.
Over a month, that's the difference between a $450 API bill and a $135 one. Over a year, you're saving nearly $4,000 on a single workload.
When You Actually Need This
If you're making fewer than 50 Claude API calls per day, the direct API is fine. The overhead of running a proxy isn't worth it at that scale.
But if you're building AI agents, running batch processing, or serving multiple users through Claude, a local proxy pays for itself on day one. The cost visibility alone is worth the 3-minute setup, even before you factor in automatic model routing.
The proxy is open source (MIT), runs locally, and your API keys never leave your machine. Your data flows through localhost, not a third-party service.
npm install -g @relayplane/proxy
relayplane init
relayplane start
# That's it. Check http://localhost:4100 for your dashboard.