How to Set Up a Proxy for OpenClaw Agents and Cut API Costs
If you're running OpenClaw agents, your API bill climbs every month, and most of the spend is wasteful. Status checks, file reads, formatting tasks, simple rewrites -- they're all hitting Opus or GPT-4o because that's what's configured. The fix is an intelligent local proxy that classifies each request by complexity and routes it to the cheapest model that can handle it. Your agents don't notice. Your wallet does.
What Is an AI Proxy?
An AI proxy sits between your application and your AI providers. Every API call passes through it. Instead of blindly forwarding requests to whatever model you configured, the proxy inspects the task, classifies its complexity, and routes it to the right model.
This isn't a load balancer. It's a cost intelligence layer.
When OpenClaw sends a request to ANTHROPIC_BASE_URL or OPENAI_BASE_URL, the proxy intercepts it. Simple task like reading a config file? Route to Haiku or GPT-4o-mini at a fraction of the cost. Complex debugging session? Send it to Opus. The routing is rule-based and deterministic -- you define the complexity mappings, and the proxy enforces them on every request.
RelayPlane supports 11 direct providers out of the box: Anthropic, OpenAI, Google Gemini, xAI/Grok, OpenRouter, DeepSeek, Groq, Mistral, Together, Fireworks, and Perplexity. Each gets native API routing with its own base URL and key support. No generic HTTP forwarding -- actual provider-aware routing.
Cost Breakdown: Before vs After Proxy Routing
Here's a typical OpenClaw agent workload breakdown to give you a sense of where the money goes:
| Task Type | % of Requests | Examples | Without Proxy | With Proxy |
|---|---|---|---|---|
| Simple | ~60% | File reads, status checks, formatting, short rewrites | Opus/GPT-4o ($$$) | Haiku/GPT-4o-mini ($) |
| Medium | ~25% | Code review, summaries, data extraction | Opus/GPT-4o ($$$) | Sonnet/GPT-4o ($$) |
| Complex | ~15% | Architecture decisions, hard debugging, long-form creative | Opus/GPT-4o ($$$) | Opus/GPT-4o ($$$) |
The math is straightforward. If you're spending $200/mo:
- ~$120 goes to simple tasks hitting expensive models. Route those to Haiku-class models and it drops to roughly $10-15.
- ~$50 goes to medium tasks. Sonnet-class routing brings that to about $30.
- ~$30 stays the same -- complex tasks that actually need the power.
Illustrative result: $200/mo becomes $55-75/mo. Your actual savings depend on your workload mix. These numbers are meant to show the shape of the savings, not make a guarantee.
Setting Up RelayPlane as Your OpenClaw Proxy
This takes about 5 minutes. Four steps.
Step 1: Install RelayPlane
npm install -g @relayplane/proxyRequires Node.js 18+. That's the only dependency.
Step 2: Initialize Your Configuration
relayplane initThis creates your config file with sensible defaults. It'll ask which providers you use and generate the routing rules. The default setup uses complexity-based routing with three tiers:
routing:
mode: complexity
models:
simple: anthropic/claude-3-haiku
medium: anthropic/claude-3.5-sonnet
complex: anthropic/claude-3-opus
cascade:
enabled: true
strategy: cost # try cheapest first, fall back on failureOther routing modes: quality (always use the best model), fast (lowest latency), or cost (always cheapest). Complexity mode is where the savings live.
Step 3: Start the Proxy
relayplane startThe proxy starts on localhost:4100 by default. You'll see a dashboard at http://localhost:4100 with real-time cost tracking, model breakdowns, and routing decisions.
Now point OpenClaw at the proxy by setting these environment variables before starting your agents:
export ANTHROPIC_BASE_URL=http://localhost:4100
export OPENAI_BASE_URL=http://localhost:4100That's it. OpenClaw sends requests to the proxy thinking it's talking to the provider directly. The proxy handles routing, tracking, and forwarding.
Step 4: Verify It's Working
relayplane statsThis shows a breakdown of requests by model, provider, cost, and routing decisions. Run a few OpenClaw tasks and check the stats to confirm requests are being routed to different models based on complexity.
The dashboard at http://localhost:4100 gives you a visual breakdown including Model Breakdown, Provider Status, Recent Runs, and your routing configuration.
Advanced: Multi-Provider Routing
The real power shows up when you route different task types to entirely different providers.
routing:
mode: complexity
models:
simple: groq/llama-3-8b # blazing fast, nearly free
medium: anthropic/claude-3.5-sonnet # strong all-rounder
complex: openai/gpt-4-turbo # heavy reasoning
cascade:
enabled: true
strategy: cost
fallback:
- anthropic/claude-3-opus
- openai/gpt-4oBudget Controls
You can set budget caps to prevent runaway costs:
relayplane budget --daily 10 --action warn
relayplane budget --hourly 2 --action blockBudget enforcement supports four actions:
- block -- hard stop, no more requests
- warn -- log and continue
- downgrade -- force a cheaper model
- alert -- send a notification
Per-request limits are also available if you want to catch runaway loops.
Anomaly Detection
RelayPlane includes built-in anomaly detection that catches cost spikes, token explosions, and runaway loops automatically. If an agent gets stuck in a retry loop burning tokens, the proxy flags it.
Run on Boot
For persistent setups:
relayplane autostartThis installs a systemd (Linux) or launchd (macOS) service so the proxy starts on boot.
Privacy and Telemetry
RelayPlane runs entirely on your machine. Your prompts, responses, and API keys never leave localhost. There is no cloud component in the free tier -- the proxy is a local process, the dashboard is a local web server.
Telemetry is on by default. It sends anonymous aggregate data: token counts, latency measurements, and model usage patterns. No prompt content, no API keys, no personally identifiable information. You can audit what gets sent or disable it:
relayplane telemetry # see current status and what's collected
relayplane telemetry off # disable completelyThe proxy doesn't add any telemetry to your OpenClaw setup. It just passes requests through.
What to Do Next
Once the proxy is running, check your dashboard after a day or two of normal usage. You'll see a clear breakdown of how many requests got routed to cheaper models and what that saved you.
- Install and run the proxy -- takes 5 minutes, free tier has no limits on requests
- Tune your routing -- after a few days, review the stats and adjust your complexity model mappings based on actual task distribution
- Set budgets -- configure daily and hourly limits so costs never run away
npm install -g @relayplane/proxy && relayplane init && relayplane startRelayPlane is open source and free to use locally. Cost tracking, local dashboard, routing, caching, and 7-day history are all included. No account required to get started.