RelayPlane sits between your OpenClaw agents and the LLM providers they call. It tracks every dollar, shows you exactly where it goes, and gives you the tools to cut waste.
On February 18, Anthropic banned using Max plan tokens in OpenClaw and similar tools. Users who were paying $200/month now face $600–2,000+ in API costs for the same usage.
But the cost crisis started before the ban. The real problem: your agent uses the most expensive model for everything. Code formatting? Opus. Reading a file? Opus. Simple questions? Opus.
Enter your monthly API spend
Every LLM request flows through RelayPlane. Cost per request, model used, task type, tokens consumed, latency — all tracked automatically. Your first "aha" moment: seeing that 73% of your spend is on the wrong model.
A policy engine classifies each task and routes it to the most cost-effective model. Simple file reads go to Haiku ($0.25/M tokens). Complex architecture goes to Opus ($15/M tokens). You set the rules — or use our defaults.
Coming soon: opt in to the collective mesh. Share anonymized routing outcomes with other agents on the network. As the mesh grows, routing gets smarter for everyone — automatically.
npm install -g @relayplane/proxy && relayplane initBuilt for OpenClaw. Works with any agent framework.
No config files. No BASE_URL changes. No risk — if RelayPlane goes down, your agent keeps working.
Run relayplane stats in your terminal for a quick cost summary.
Supports: Anthropic · OpenAI · Google Gemini · xAI/Grok · OpenRouter · DeepSeek · Groq · Mistral · Together · Fireworks · Perplexity
We learned this the hard way. Early versions hijacked provider URLs — one crash took everything down for 8 hours. Never again.
RelayPlane uses a circuit breaker architecture. After 3 failures, all traffic bypasses the proxy automatically. Your agent talks directly to the provider. When RelayPlane recovers, traffic resumes. No manual intervention.
When you opt into the mesh, RelayPlane will report anonymized routing outcomes: "task type X + model Y = success/failure." No prompts. No code. No user data. Just operational signals. The data below is projected, not live.
relayplane mesh contribute on"I was mass spending $200+/month running an agent swarm and had zero visibility into where the money was going. Turns out 73% of my requests were using Opus for tasks Haiku could handle."— Matt Turley, Continuum
No. RelayPlane uses a circuit breaker architecture. If the proxy fails for any reason, all traffic automatically bypasses it and goes directly to your LLM provider. Your agent doesn't even notice. If RelayPlane can't route, it passes through to your default model. Worst case: you pay what you would have paid anyway. We learned this lesson the hard way and built the safety model first.
Only anonymized metadata: task type, token count, model used, success/fail. Never your prompts, code, or outputs. Your actual prompts never leave your machine. On the free tier, nothing leaves your machine at all — everything runs locally. If you opt into the Contributor tier, only these anonymized routing signals are shared.
RelayPlane classifies each request by task complexity (simple, medium, complex) and routes it to the cheapest model that can handle it. File reads and simple edits go to Haiku. Code generation goes to Sonnet. Complex architecture and reasoning stays on Opus. You can customize the rules or use our defaults.
Yes. RelayPlane supports any OpenAI-compatible API — Claude, GPT-4o, Gemini, Mistral, open-source models via OpenRouter. The routing engine is model-agnostic.
It depends on your usage pattern, but most users see 40-70% cost reduction. The biggest savings come from routing simple tasks (which are typically 60-70% of all requests) to cheaper models. You'll see exact numbers in your dashboard within the first hour.
Different layer entirely. OpenRouter is a multi-provider gateway: you pick the model, it routes to the cheapest provider for that model. RelayPlane picks the right model for the task. RelayPlane is local-first (your machine, your data). OpenRouter is a cloud service you send all prompts through. They're complementary: you can use OpenRouter as a provider behind RelayPlane. RelayPlane adds cost tracking, task classification, and a local dashboard on top.
LiteLLM is a unified API adapter: call any provider with one SDK. RelayPlane does that and adds intelligent routing, cost tracking, and a dashboard. LiteLLM requires code changes (import litellm). RelayPlane is a transparent proxy, zero code changes needed. LiteLLM is a library you integrate. RelayPlane is infrastructure you deploy.
Two-layer pre-flight classification with zero latency overhead (no LLM calls). First, task type: regex pattern matching on the prompt across 9 categories (code generation, summarization, analysis, etc.), under 5ms. Second, complexity: structural signals like code blocks, token count, and multi-step instructions scored as simple, moderate, or complex. Routing rules map task type + complexity to a model tier. Cascade mode starts with the cheapest model and auto-escalates on uncertainty or refusal patterns.
No. RelayPlane runs entirely on your machine. No cloud dependency. No telemetry unless you opt into the mesh (coming soon). MIT licensed, fully auditable.
Nothing. Remove the proxy, your agents talk directly to providers again. No lock-in, no migration, no data hostage. It's MIT licensed — you can fork it and run your own if you want.
RelayPlane automatically retries with a better model. You pay for both calls, but that's still cheaper than always using Opus. Collective failure learning is on the roadmap — for now, the proxy logs these so you can adjust your routing rules.
The local proxy works forever — it's MIT licensed software on your machine. Only the mesh network would stop. You'd keep ~30% savings on static rules.
Run relayplane stats or check the dashboard. We show you exactly how much you've saved vs. what you would have spent.
Install RelayPlane. See your costs in 3 minutes.
npm install -g @relayplane/proxy && relayplane init