RelayPlane is an npm-native Node.js LLM proxy. Install with npm install @relayplane/proxy, point your ANTHROPIC_BASE_URL or OPENAI_BASE_URL at http://localhost:4100, and get per-request cost tracking, complexity-based model routing, and Ollama local fallback, no Docker required.

How does RelayPlane compare to LiteLLM?

LiteLLM is a Python library with a proxy server option. RelayPlane is npm-native for Node.js developers, no Python, no Docker, just npm install. RelayPlane has per-request cost tracking built in and integrates directly with Claude Code, Cursor, and OpenClaw via a simple base URL override.

How does RelayPlane compare to Portkey or Helicone?

Portkey and Helicone offer cloud observability and are npm-compatible. RelayPlane runs entirely locally (no cloud required), is open source under MIT, and focuses on cost tracking plus complexity-based routing for Node.js developers using Claude Code, Cursor, and OpenClaw.

Does RelayPlane work with Claude Code and Cursor?

Yes. RelayPlane is designed for Claude Code, Cursor, and any tool that supports ANTHROPIC_BASE_URL or OPENAI_BASE_URL. Set ANTHROPIC_BASE_URL=http://localhost:4100 and RelayPlane intercepts every request, tracks costs, and routes to cheaper models for simple tasks.

Yes, completely. RelayPlane is free and open source (MIT). Every feature is included: full proxy, local and cloud dashboard, all 11 providers, budget enforcement, anomaly detection, hard cost caps, auto-kill, and 90-day history. It runs locally on your machine, your keys and prompts never leave your box. No tiers, no paywalls, no credit card. See relayplane.com/pricing.

How do I install the RelayPlane npm LLM proxy?

Run: npm install -g @relayplane/proxy. Then: relayplane init && relayplane start. The local dashboard runs at http://localhost:4100. Set ANTHROPIC_BASE_URL=http://localhost:4100 in your environment and all Claude requests route through RelayPlane automatically.

relayplaneclaude-codeapi-costsai-proxytutorial

How to Cut Claude Code API Costs 70% in 5 Minutes with RelayPlane

Name: RelayPlane
Author: Matt Turley

Matt Turley·March 8, 2026·6 min read

I spent $340 on Anthropic API calls last month running Claude Code through OpenClaw. Most of that was Opus tokens burning on tasks that didn't need Opus. File reads, config checks, formatting, git status. All routed to the most expensive model because that's what was configured.

The fix took me five minutes. I dropped a local proxy between OpenClaw and Anthropic's API, and now simple tasks hit Sonnet or Haiku instead of Opus. My agents didn't notice. My bill did.

Here's the exact setup.

The 5-Minute Walkthrough

Three commands, one environment variable. That's it.

Step 1: Install the proxy

npm install -g @relayplane/proxy

You need Node.js 18+. If you're running Claude Code, you already have it.

Step 2: Initialize and start

relayplane init
relayplane start

relayplane init creates a config file with complexity-based routing out of the box. It asks which providers you use and sets up sensible defaults. The proxy starts on port 4100.

Step 3: Point OpenClaw at it

export ANTHROPIC_BASE_URL=http://localhost:4100

That's the whole trick. OpenClaw thinks it's talking directly to Anthropic. The proxy intercepts every request, classifies the task complexity, and routes it to the cheapest model that can handle it. Complex architecture decisions still go to Opus. Your “read this file and tell me what changed” requests go to Haiku.

Restart your OpenClaw agents (or just start a new session), and you're running through the proxy.

Verify it's working:

relayplane stats

You'll see a breakdown of which models handled which requests and what you're saving. There's also a dashboard at http://localhost:4100 with real-time cost tracking.

Done. That's five minutes.

Before vs. After: The Math

Let's run the numbers with Anthropic's actual current pricing.

Current Claude API pricing (per million tokens):

Model	Input	Output
Claude Opus 4.6	$5.00	$25.00
Claude Sonnet 4.6	$3.00	$15.00
Claude Haiku 4.5	$1.00	$5.00

Now here's the thing most people don't realize about their Claude Code usage: the majority of requests are simple. I tracked mine for a week. Roughly 55-60% of all API calls were things like reading files, checking status, formatting output, writing boilerplate, and running simple tool calls. Another 25% were medium-complexity tasks like code review and refactoring. Only about 15% actually needed the full reasoning power of Opus.

Here's what that looks like on a $200/month bill:

Without proxy (everything hits Opus 4.6):

Task Type	% of Tokens	Monthly Cost
Simple (file reads, formatting, status)	~60%	$120
Medium (code review, refactoring)	~25%	$50
Complex (architecture, hard debugging)	~15%	$30
Total		$200

With RelayPlane routing:

Task Type	Routed To	Monthly Cost
Simple	Haiku 4.5 ($1/$5)	$12
Medium	Sonnet 4.6 ($3/$15)	$30
Complex	Opus 4.6 ($5/$25)	$30
Total		$72

That's a 64% reduction. And if you're one of those folks still running legacy Claude Opus 4 at $15/$75 per MTok, the savings are even more dramatic. Same routing logic on those prices pushes you past 70% easily because the gap between Opus 4 and Haiku is massive.

The key insight: you're not degrading quality on the tasks that matter. Complex reasoning still gets the best model. You're just stopping the waste on tasks that never needed it.

How the Routing Actually Works

RelayPlane doesn't randomly guess which model to use. The default config uses complexity-based classification:

routing:
  mode: complexity
  models:
    simple: anthropic/claude-haiku-4-5
    medium: anthropic/claude-sonnet-4-6
    complex: anthropic/claude-opus-4-6
  cascade:
    enabled: true
    strategy: cost

The proxy inspects each request, looks at the prompt complexity, tool usage, context length, and routes accordingly. If the cheaper model fails or returns a low-confidence response, cascade mode retries with the next tier up.

You can tune these mappings after a few days by checking relayplane stats and adjusting. Maybe your codebase needs Sonnet for tasks the proxy classified as “simple.” Just edit the config. The defaults are a solid starting point though.

Bonus: Ollama Local Fallback for Dev/Testing

If you're running Ollama locally with something like llama3 or codellama, you can add it as a fallback provider:

routing:
  mode: complexity
  models:
    simple: ollama/llama3
    medium: anthropic/claude-sonnet-4-6
    complex: anthropic/claude-opus-4-6

Now your simple tasks don't even hit a paid API. They run against your local Ollama instance. Zero cost. This is incredible for development and testing, where you're iterating fast and burning through tokens on stuff that doesn't need cloud-grade intelligence.

The setup is dead simple. If Ollama is running on your machine (ollama serve), RelayPlane auto-detects it. No extra config needed beyond specifying ollama/model-name in your routing rules.

For my dev workflow, I route simple and medium tasks to local models and only send complex tasks to Anthropic. My API bill during development dropped to almost nothing.

What About Privacy?

Quick note since this comes up every time: RelayPlane runs entirely on your machine. Your prompts and API keys never leave localhost. There's no cloud relay, no data collection. The proxy is a local Node.js process and the dashboard is a local web server.

Telemetry is on by default. It only sends anonymous aggregate stats (token counts, latency). No prompt content, ever. Check the status with relayplane telemetry and disable it with relayplane telemetry off.

The One-Liner Summary

You're probably overpaying for Claude Code by 60-70% right now because every request, no matter how trivial, hits your most expensive model. A local proxy fixes that in five minutes with zero changes to your agents.

npm install -g @relayplane/proxy && relayplane init && relayplane start

Then set ANTHROPIC_BASE_URL=http://localhost:4100 and you're done.

Check the RelayPlane docs for advanced routing configs, multi-provider setups, and budget controls. The free tier has no request limits.

Your agents won't know the difference. Your API bill will.

All posts