relayplaneopenclawai-proxycost-controltutorial

How to Set Up a Proxy for OpenClaw Agents and Cut API Costs

Matt Turley··8 min read

If you're running OpenClaw agents, your API bill climbs every month, and most of the spend is wasteful. Status checks, file reads, formatting tasks, simple rewrites -- they're all hitting Opus or GPT-4o because that's what's configured. The fix is an intelligent local proxy that classifies each request by complexity and routes it to the cheapest model that can handle it. Your agents don't notice. Your wallet does.


What Is an AI Proxy?

An AI proxy sits between your application and your AI providers. Every API call passes through it. Instead of blindly forwarding requests to whatever model you configured, the proxy inspects the task, classifies its complexity, and routes it to the right model.

This isn't a load balancer. It's a cost intelligence layer.

When OpenClaw sends a request to ANTHROPIC_BASE_URL or OPENAI_BASE_URL, the proxy intercepts it. Simple task like reading a config file? Route to Haiku or GPT-4o-mini at a fraction of the cost. Complex debugging session? Send it to Opus. The routing is rule-based and deterministic -- you define the complexity mappings, and the proxy enforces them on every request.

RelayPlane supports 11 direct providers out of the box: Anthropic, OpenAI, Google Gemini, xAI/Grok, OpenRouter, DeepSeek, Groq, Mistral, Together, Fireworks, and Perplexity. Each gets native API routing with its own base URL and key support. No generic HTTP forwarding -- actual provider-aware routing.


Cost Breakdown: Before vs After Proxy Routing

Here's a typical OpenClaw agent workload breakdown to give you a sense of where the money goes:

Task Type% of RequestsExamplesWithout ProxyWith Proxy
Simple~60%File reads, status checks, formatting, short rewritesOpus/GPT-4o ($$$)Haiku/GPT-4o-mini ($)
Medium~25%Code review, summaries, data extractionOpus/GPT-4o ($$$)Sonnet/GPT-4o ($$)
Complex~15%Architecture decisions, hard debugging, long-form creativeOpus/GPT-4o ($$$)Opus/GPT-4o ($$$)

The math is straightforward. If you're spending $200/mo:

  • ~$120 goes to simple tasks hitting expensive models. Route those to Haiku-class models and it drops to roughly $10-15.
  • ~$50 goes to medium tasks. Sonnet-class routing brings that to about $30.
  • ~$30 stays the same -- complex tasks that actually need the power.

Illustrative result: $200/mo becomes $55-75/mo. Your actual savings depend on your workload mix. These numbers are meant to show the shape of the savings, not make a guarantee.


Setting Up RelayPlane as Your OpenClaw Proxy

This takes about 5 minutes. Four steps.

Step 1: Install RelayPlane

npm install -g @relayplane/proxy

Requires Node.js 18+. That's the only dependency.

Step 2: Initialize Your Configuration

relayplane init

This creates your config file with sensible defaults. It'll ask which providers you use and generate the routing rules. The default setup uses complexity-based routing with three tiers:

routing:
  mode: complexity
  models:
    simple: anthropic/claude-3-haiku
    medium: anthropic/claude-3.5-sonnet
    complex: anthropic/claude-3-opus
  cascade:
    enabled: true
    strategy: cost  # try cheapest first, fall back on failure

Other routing modes: quality (always use the best model), fast (lowest latency), or cost (always cheapest). Complexity mode is where the savings live.

Step 3: Start the Proxy

relayplane start

The proxy starts on localhost:4100 by default. You'll see a dashboard at http://localhost:4100 with real-time cost tracking, model breakdowns, and routing decisions.

Now point OpenClaw at the proxy by setting these environment variables before starting your agents:

export ANTHROPIC_BASE_URL=http://localhost:4100
export OPENAI_BASE_URL=http://localhost:4100

That's it. OpenClaw sends requests to the proxy thinking it's talking to the provider directly. The proxy handles routing, tracking, and forwarding.

Step 4: Verify It's Working

relayplane stats

This shows a breakdown of requests by model, provider, cost, and routing decisions. Run a few OpenClaw tasks and check the stats to confirm requests are being routed to different models based on complexity.

The dashboard at http://localhost:4100 gives you a visual breakdown including Model Breakdown, Provider Status, Recent Runs, and your routing configuration.


Advanced: Multi-Provider Routing

The real power shows up when you route different task types to entirely different providers.

routing:
  mode: complexity
  models:
    simple: groq/llama-3-8b             # blazing fast, nearly free
    medium: anthropic/claude-3.5-sonnet  # strong all-rounder
    complex: openai/gpt-4-turbo          # heavy reasoning
  cascade:
    enabled: true
    strategy: cost
    fallback:
      - anthropic/claude-3-opus
      - openai/gpt-4o

Budget Controls

You can set budget caps to prevent runaway costs:

relayplane budget --daily 10 --action warn
relayplane budget --hourly 2 --action block

Budget enforcement supports four actions:

  • block -- hard stop, no more requests
  • warn -- log and continue
  • downgrade -- force a cheaper model
  • alert -- send a notification

Per-request limits are also available if you want to catch runaway loops.

Anomaly Detection

RelayPlane includes built-in anomaly detection that catches cost spikes, token explosions, and runaway loops automatically. If an agent gets stuck in a retry loop burning tokens, the proxy flags it.

Run on Boot

For persistent setups:

relayplane autostart

This installs a systemd (Linux) or launchd (macOS) service so the proxy starts on boot.


Privacy and Telemetry

RelayPlane runs entirely on your machine. Your prompts, responses, and API keys never leave localhost. There is no cloud component in the free tier -- the proxy is a local process, the dashboard is a local web server.

Telemetry is on by default. It sends anonymous aggregate data: token counts, latency measurements, and model usage patterns. No prompt content, no API keys, no personally identifiable information. You can audit what gets sent or disable it:

relayplane telemetry        # see current status and what's collected
relayplane telemetry off    # disable completely

The proxy doesn't add any telemetry to your OpenClaw setup. It just passes requests through.


What to Do Next

Once the proxy is running, check your dashboard after a day or two of normal usage. You'll see a clear breakdown of how many requests got routed to cheaper models and what that saved you.

  1. Install and run the proxy -- takes 5 minutes, free tier has no limits on requests
  2. Tune your routing -- after a few days, review the stats and adjust your complexity model mappings based on actual task distribution
  3. Set budgets -- configure daily and hourly limits so costs never run away
npm install -g @relayplane/proxy && relayplane init && relayplane start

RelayPlane is open source and free to use locally. Cost tracking, local dashboard, routing, caching, and 7-day history are all included. No account required to get started.