RelayPlane is an npm-native Node.js LLM proxy. Install with npm install @relayplane/proxy, point your ANTHROPIC_BASE_URL or OPENAI_BASE_URL at http://localhost:4100, and get per-request cost tracking, complexity-based model routing, and Ollama local fallback, no Docker required.

How does RelayPlane compare to LiteLLM?

LiteLLM is a Python library with a proxy server option. RelayPlane is npm-native for Node.js developers, no Python, no Docker, just npm install. RelayPlane has per-request cost tracking built in and integrates directly with Claude Code, Cursor, and OpenClaw via a simple base URL override.

How does RelayPlane compare to Portkey or Helicone?

Portkey and Helicone offer cloud observability and are npm-compatible. RelayPlane runs entirely locally (no cloud required), is open source under MIT, and focuses on cost tracking plus complexity-based routing for Node.js developers using Claude Code, Cursor, and OpenClaw.

Does RelayPlane work with Claude Code and Cursor?

Yes. RelayPlane is designed for Claude Code, Cursor, and any tool that supports ANTHROPIC_BASE_URL or OPENAI_BASE_URL. Set ANTHROPIC_BASE_URL=http://localhost:4100 and RelayPlane intercepts every request, tracks costs, and routes to cheaper models for simple tasks.

Yes, completely. RelayPlane is free and open source (MIT). Every feature is included: full proxy, local and cloud dashboard, all 11 providers, budget enforcement, anomaly detection, hard cost caps, auto-kill, and 90-day history. It runs locally on your machine, your keys and prompts never leave your box. No tiers, no paywalls, no credit card. See relayplane.com/pricing.

How do I install the RelayPlane npm LLM proxy?

Run: npm install -g @relayplane/proxy. Then: relayplane init && relayplane start. The local dashboard runs at http://localhost:4100. Set ANTHROPIC_BASE_URL=http://localhost:4100 in your environment and all Claude requests route through RelayPlane automatically.

openclawcost-controlapi-costsrelayplane

OpenClaw API Costs Explained: Where Your Tokens Actually Go

Name: RelayPlane
Author: Matt Turley

Matt Turley·March 16, 2026·6 min read

I've been running OpenClaw in production for months now. The first month, my Anthropic bill was roughly what I expected. The second month, it was 3x higher. Not because pricing changed, but because I had zero visibility into where tokens were actually going.

If you're running OpenClaw and want to understand your openclaw api costs before they surprise you, this is the breakdown I wish I'd had on day one.

How OpenClaw Billing Actually Works

OpenClaw doesn't charge you for API access. There's no markup, no platform fee on tokens. Every API call your agents make passes straight through to Anthropic, and you pay Anthropic's per-token rates directly.

As of early 2026, that means:

Claude Haiku 4.5: $1.00 / $5.00 per million tokens (input/output)
Claude Sonnet 4.6: $3.00 / $15.00 per million tokens (input/output)
Claude Opus 4.6: $5.00 / $25.00 per million tokens (input/output)

The openclaw api price is Anthropic's price. Full stop. Your bill is determined by which models your agents hit, how many tokens they consume, and how often they run. OpenClaw is the orchestration layer, not the billing layer.

This sounds simple, but it creates a real problem: OpenClaw doesn't give you a per-request cost breakdown out of the box. You see a single Anthropic invoice at the end of the month with no way to attribute costs to specific agents, sessions, or tasks.

The Hidden Cost: Context Window Creep in Long Sessions

The biggest cost driver I've found isn't the model you pick. It's context window accumulation in long-running agent sessions.

Here's how it works. Every message in a session gets appended to the context window. By message 15, you might be sending 80K input tokens per request, because the full conversation history ships with every call. A session that starts at $0.02/request can quietly climb to $0.40/request by the end.

For autonomous agents running on cron jobs or heartbeat loops, this compounds fast. I had one agent running a 45-minute research session that burned through $12 in a single loop, mostly because the context window grew unchecked.

The fix is architectural: keep sessions short, split complex tasks into sub-agent spawns (each gets a fresh context), and be aggressive about what you include in system prompts.

Per-Request Cost Tracking: See Exactly Where Your Money Goes

This is the problem that pushed me to build RelayPlane. I needed per-request cost attribution, broken down by agent, session, and task.

RelayPlane sits between OpenClaw and Anthropic as a local proxy. Every request passes through it, and it logs the exact token counts, model used, and calculated cost. Setup takes about two minutes:

npm install -g @relayplane/proxy

Then point OpenClaw's Anthropic provider at the proxy:

openclaw config set models.providers.anthropic.baseUrl http://localhost:4100

That's it. Your existing API key passes through untouched. RelayPlane doesn't store keys or modify requests. It just observes and logs.

Now every request shows up in your local dashboard with:

Input/output token counts
Model used (and whether it was auto-routed)
Calculated cost at current rates
Agent ID and session ID for attribution
Timestamp and latency

When I finally saw the per-request breakdown, I found that 60% of my monthly cost came from a single agent that was re-reading large files into context on every heartbeat cycle. Five-minute fix. Saved $80/month.

Model Routing: Automatic Haiku/Sonnet/Opus Selection by Complexity

OpenClaw supports model routing, where the system picks Haiku, Sonnet, or Opus based on task complexity. This is one of the best ways to reduce your openclaw api cost without sacrificing output quality.

The idea is straightforward: simple classification tasks and short responses don't need Opus. A well-configured routing setup sends maybe 70% of requests to Sonnet, 20% to Haiku, and only 10% to Opus for genuinely complex reasoning.

RelayPlane's complexity-based routing does this automatically. It analyzes the prompt complexity before forwarding and picks the cheapest model that can handle it. In my setup, this cut costs by roughly 40% compared to running everything on Opus.

The tradeoff is real though. Haiku occasionally fumbles on multi-step tool use. Sonnet handles 90% of agent tasks perfectly. Opus is worth the premium for planning, code architecture, and anything requiring long chains of reasoning. The key is matching model to task, not defaulting to the most expensive option.

Setting Budget Caps to Prevent Surprises

After my first surprise bill, I set hard budget caps. You should too.

At the agent level, if you're using Claude Code for coding tasks, the --max-budget-usd flag kills the session when it hits your limit:

claude --max-budget-usd 5.00 --message "implement the auth module"

At the proxy level, RelayPlane lets you set daily spending caps that reject requests once you've hit the threshold. This is your circuit breaker for runaway agent sessions.

The combination I run: $5 per coding session, $50 daily cap across all agents, with Slack alerts at 80% of each threshold. I haven't had a surprise bill since.

The Bottom Line

OpenClaw api costs are Anthropic costs. The platform passes through every token at face value. But without per-request tracking, you're flying blind on where those tokens go, and context window creep will inflate your bill faster than you expect.

My stack for cost control: short sessions, aggressive sub-agent splitting, complexity-based model routing, and RelayPlane for full per-request visibility. Total setup time is under 10 minutes, and it paid for itself in the first week.

Your agents are burning tokens right now. The question is whether you know where.

RelayPlane is open source and free to use locally. The npm package is @relayplane/proxy. No account required to get started.

All posts