TENANT ISOLATION · SPEC-MATCH · KILL-SWITCH · AUDIT TRAIL · COST INTELLIGENCE

You're overpaying for AI.
Route smarter, locally.

RelayPlane sits between your orchestrator and your agents. Every request verified against spec. Every tenant isolated. Every dollar tracked. One API call to halt any tenant instantly.

Install FreeRead the Governance Spec

Local by default. No telemetry required. · 165 GITHUB STARS

🔒 relayplane.com/dashboard
Runs
Policies
Routing
Analytics
Proxy Active
Runs
Monitor and analyze agent runs in real-time
Live↻ Refresh
Total Runs
0
↗ +14.5% vs last week
Total Cost
$0
↗ +11.2% vs last week
Avg Latency
0ms
↘ -12.8% vs last week
Success Rate
0%
↗ +1.5% vs last week
#a3f7ccoder-mainclaude-opus · 5m ago$0.471.2s
#b8e2dcoder-mainclaude-sonnet · 12m ago$0.08623ms
#c4f9aci-agentgpt-4o · 25m ago$0.0430.0s
#d7e1bresearch-agentclaude-opus · 45m ago$0.000ms
#e2c8fcoder-mainclaude-opus · 30s ago$0.240ms
Analytics
Cost breakdown, cache savings, and savings analysis

Savings This Week

$168.40
saved vs. unoptimized routing

Cost by Model

Opus $56
Sonnet $41
Haiku $32
GPT-4o $18

Cost per Model (7 days)

claude-opus$56.20
claude-sonnet$41.10
claude-haiku$31.80
gpt-4o$18.13
Routing
Live routing decisions
Read file: src/utils/config.ts
SimpleHaikusaved $0.14
2s ago
Generate unit tests for auth module
MediumSonnetsaved $0.08
18s ago
Refactor database schema for multi-tenancy
ComplexOpuscorrect tier
34s ago
Format JSON output from API response
SimpleHaikusaved $0.12
1m ago
Implement WebSocket reconnection logic
MediumSonnetsaved $0.11
2m ago
Policies
Active routing rules
if task = SIMPLE then route → haikuFile reads, small edits, simple questions
Active · matched 847 times today
if task = MEDIUM then route → sonnetCode generation, refactoring, test writing
Active · matched 312 times today
if context > 50K tokens then route → gemini-proLarge context window: cheaper per token
Active · matched 88 times today
if monthly_spend > $200 then alert via telegramBudget guardrail: notify before overspend
Active · budget: $147.23 / $200
THE MATH

What routing actually saves you.

ScenarioModel mixWeekly cost
Without routing100% Opus$33.75
With RelayPlane routing30% Opus · 70% Sonnet$11.23
You save per week$22.52 (67%)

These are illustrative numbers based on a 30% Opus / 70% Sonnet routing estimate. Opus: $15/M tokens · Sonnet: $3/M tokens (Anthropic April 2026 pricing). Your actual savings depend on usage patterns.

THE PROBLEM

Agents are powerful.
Ungoverned agents are dangerous.

When you run an agent swarm commercially — for clients, for production systems, for anything that matters — you need more than cost tracking. You need provable isolation, verifiable output, and a stop button that works in under one second.

Without a governance layer: Client A's runaway agent bleeds into Client B's budget. Agents mark tasks done without meeting spec. A billing emergency has no kill-switch. Auditors ask for logs you don't have.

The gap is governance: tenant isolation, spec-match verification, kill-switch, audit trail.
Before RelayPlane
Simple tasks (file reads, edits)Opus$142/moWaste
Medium tasks (code generation)Opus$38/moOverspend
Complex tasks (architecture)Opus$18/moCorrect
Retried failuresOpus$24/moPreventable
Without RelayPlane
$222/mo
With RelayPlane
$52/mo
You save
$170/mo
77% less
How it works
Simple tasksOpus → Cheapest capable model*$142 → $12
Medium tasksOpus → Sonnet$38 → $22
Complex tasksOpus → Opus$18 → $18No change
Retried failuresEliminated$24 → $0Avoided

*Default routes simple work to Sonnet. With provider keys configured, RelayPlane can route simple tasks to any cheaper capable model you choose.

How much could you save?

Enter your monthly API spend

$/month
With smart routing
−$150
Net savings: +$150/mo
CASE STUDY · SWARM-AS-SERVICE

How Matt runs a commercial
agent swarm through RelayPlane.

Matt Turley runs Continuum, a commercial agent swarm service. Every client gets their own tenant lane through RelayPlane. Requests from Client A are physically isolated from Client B at the proxy layer — not just logically separated in application code.

Every coder agent task includes its acceptance criteria. When the agent finishes, RelayPlane's spec-match plugin verifies the output before it reaches the verifier. Only passing tasks proceed. Failing tasks retry with escalation — automatically.

When a billing anomaly spikes unexpectedly, Matt calls DELETE /v1/tenants/acme-corp/kill. That tenant's traffic stops within the next request cycle — under one second. No waiting for rate limits or budget ceilings.

RelayPlane is not a feature of Matt's swarm. It is the moat. Every client runs through it.
What governance unlocks
Client A trafficIsolated tenant lane✓ Isolated
Task completionSpec-match verified✓ Verified
Runaway agentKill-switch halted✓ Stopped
Compliance askAudit bundle export✓ Ready
Cost overrunHard budget per tenant✓ Blocked
THE MARKET SHIFT

The ban accelerated the need for governance.

When Anthropic stopped supporting Max subscriptions in third-party tools, developers moved to the API and started paying attention to costs. But cost control was just the first layer. Now teams are running production agent swarms for clients and need the full governance stack: isolation, verification, audit, kill-switch.

RelayPlane started as a cost-routing proxy. It is now the governance gateway every commercial agent deployment needs.

HOW IT WORKS

Three pillars. One install.

Pillar 1 · Observe

See everything, per tenant

Every LLM request flows through RelayPlane with full tenant attribution. Cost per request, model used, task type, tokens consumed, latency: all tracked and namespaced by tenant. Per-agent cost tracking identifies each agent by system prompt fingerprint. Tenant dashboards show spend, request volume, and top models — isolated from other tenants. Every trace linked to the tenant that generated it.

  • Per-tenant cost and request tracking
  • Model breakdown by provider and tenant
  • Cache-aware cost tracking
  • Per-agent cost breakdown
  • Task classification
  • Tamper-proof audit trail per tenant
Pillar 2 · Govern

Hard limits. Instant kill-switch.

A layered policy engine with per-tenant enforcement. Hard budget caps block requests when daily limits are hit — no soft warnings, no grace periods unless you configure them. Tenant isolation ensures a runaway agent on one tenant cannot affect others. The kill-switch halts all traffic for a tenant within one request cycle. Anomaly detection catches runaway loops and cost spikes in real time.

  • Per-tenant hard budget caps (daily / monthly)
  • Tenant isolation — physical request queue separation
  • Kill-switch API: halt any tenant instantly via one HTTP call
  • Model allowlist / denylist per tenant
  • Anomaly detection: velocity spikes, repetition loops, token explosions
  • Cross-provider cascade routing (Anthropic, OpenAI, Gemini, Groq, OpenRouter)
  • Cost alerts & webhook delivery
Pillar 3 · Verify

Spec-match before it ships.

Before any agent marks a task complete, RelayPlane's spec-match plugin evaluates the output against the task's acceptance criteria. The orchestrator POSTs the diff + criteria to /v1/spec-match. A cheap judge model (Haiku by default) evaluates each criterion and returns a structured pass/fail result with per-criterion evidence and confidence scores. Only pass: true results proceed. Failing tasks automatically retry.

  • Acceptance criteria evaluation at task completion
  • Diff + text + screenshot support
  • Per-criterion pass/fail with evidence and confidence
  • Blocker vs. major vs. minor severity scoring
  • Weighted 0–100 score
  • Trace linked to tenant audit trail
  • Configurable judge model (default: Haiku for cost)
COMPLIANCE

Audit bundle export.
Tamper-proof. One API call.

Every action in RelayPlane is recorded in a tamper-proof audit chain. Each entry is checksummed and linked to the previous one — if anything is modified, verification fails. Export a compliance bundle for any tenant, any time range, in JSON, CSV, or JSONL.

Enterprise tiers retain audit logs for up to 7 years. Free tier: 7 days. Built on the same audit infrastructure used for SOC 2 evidence collection.

Audit bundle contains
Request logEvery API call, model, cost, latency
Policy decisionsAllow / block / downgrade with reason
Spec-match resultsPass/fail + criteria evidence
Kill-switch eventsActivated by, reason, timestamp
Budget overridesWho approved, when, amount
Chain verificationHMAC checksum per entry
GET STARTED

One command. Three minutes.

npm install -g @relayplane/proxy && relayplane init && relayplane start

Works with Claude Code, Cursor, OpenClaw, and any agent that supports ANTHROPIC_BASE_URL or OPENAI_BASE_URL.
Point your agent at localhost:4100 via ANTHROPIC_BASE_URL or OPENAI_BASE_URL. No risk: if RelayPlane goes down, your agent keeps working.
Run relayplane stats in your terminal for a quick cost summary.

Step 1
Install
npm install
Step 2
Init config
relayplane init
Step 3
Start proxy
relayplane start
Step 4
See savings
localhost:4100

Supports: Anthropic · OpenAI · Google Gemini · xAI/Grok · OpenRouter · DeepSeek · Groq · Mistral · Together · Fireworks · Perplexity

ZERO RISK

If RelayPlane crashes,
your agent doesn't notice.

We learned this the hard way. Early versions hijacked provider URLs. One crash took everything down for 8 hours. Never again.

RelayPlane uses a circuit breaker architecture. After 3 failures, all traffic bypasses the proxy automatically. Your agent talks directly to the provider. When RelayPlane recovers, traffic resumes. No manual intervention.

Closed
Normal operation
Open
After 3 failures
Half-Open
Probe recovery
Closed
Recovered
THE DATA

Built on pain we measured.

211K
OpenClaw ecosystem
GitHub stars
582K
Views on the
API cost crisis
$35–720
Monthly user
spend range
77%
Cost reduction
documented
"I was mass spending $200+/month running an agent swarm and had zero visibility into where the money was going. Turns out 73% of my requests were using Opus for tasks Haiku could handle."Matt Turley, Continuum
FAQ

Common questions

Will this break my OpenClaw setup?

No. RelayPlane uses a circuit breaker architecture. If the proxy fails for any reason, all traffic automatically bypasses it and goes directly to your LLM provider. Your agent doesn't even notice. If RelayPlane can't route, it passes through to your default model. Worst case: you pay what you would have paid anyway. We learned this lesson the hard way and built the safety model first.

Is RelayPlane free?

Yes. RelayPlane is open source (MIT license) and free to self-host. All features work locally with no account required. There are no paid tiers currently.

What data do you collect?

Telemetry is on by default. You can disable it (relayplane telemetry off). We collect anonymized metadata: task type label, token count, model used, latency, estimated cost, and an anonymous device ID. Your prompts, code, and responses are never collected. They go directly to LLM providers.

How does task-aware routing work?

RelayPlane uses heuristic classification (token counts, keyword patterns, code block detection) to label requests by complexity: simple, moderate, or complex. When you enable routing, it maps these labels to models you choose, for example simple to Sonnet or a cheaper capable model, moderate to Sonnet, complex to Opus. Default behavior is passthrough unless you configure routing rules.

Does RelayPlane work with Claude Max subscriptions?

RelayPlane works with Anthropic API keys, pay-as-you-go or prepaid. Anthropic no longer supports Claude Max subscription tokens in third-party tools. If you're moving from Max to API keys, RelayPlane helps you control costs at the API level.

Does it work with models other than Anthropic?

Yes. RelayPlane supports any OpenAI-compatible API: Claude, GPT-4o, Gemini, Mistral, open-source models via OpenRouter. The routing engine is model-agnostic.

How much will I actually save?

It depends on your usage pattern, but most users see 40-70% cost reduction. The biggest savings come from routing simple tasks (which are typically 60-70% of all requests) to cheaper models. You'll see exact numbers in your dashboard within the first hour.

How does RelayPlane compare to OpenRouter?

Different layer entirely. OpenRouter is a multi-provider gateway: you pick the model, it routes to the cheapest provider for that model. RelayPlane picks the right model for the task. RelayPlane is local-first (your machine, your data). OpenRouter is a cloud service you send all prompts through. They're complementary: you can use OpenRouter as a provider behind RelayPlane. RelayPlane adds cost tracking, task classification, and a local dashboard on top.

How does RelayPlane compare to LiteLLM?

LiteLLM is a unified API adapter: call any provider with one SDK. RelayPlane does that and adds configurable task-aware routing, cost tracking, and a dashboard. LiteLLM requires code changes (import litellm). RelayPlane is a proxy: you set ANTHROPIC_BASE_URL to point at it. LiteLLM is a library you integrate. RelayPlane is infrastructure you deploy.

How does the task classification work?

Two-layer pre-flight classification with zero latency overhead (no LLM calls). First, task type: regex pattern matching on the prompt across 9 categories (code generation, summarization, analysis, etc.), under 5ms. Second, complexity: structural signals like code blocks, token count, and multi-step instructions scored as simple, moderate, or complex. Routing rules map task type + complexity to a model tier. Cascade mode starts with the cheapest model and auto-escalates on uncertainty or refusal patterns.

Is my data sent anywhere?

Your prompts and responses go directly to LLM providers, never through RelayPlane servers. Telemetry (anonymous metadata: task type, token counts, model, cost) is on by default. Disable it with relayplane telemetry off. MIT licensed, fully auditable.

What happens if I stop using it?

Nothing. Remove the proxy, your agents talk directly to providers again. No lock-in, no migration, no data hostage. It's MIT licensed. You can fork it and run your own if you want.

What if a cheaper model returns bad results?

RelayPlane automatically retries with a better model. You pay for both calls, but that's still cheaper than always using Opus. Collective failure learning is on the roadmap. For now, the proxy logs these so you can adjust your routing rules.

What if RelayPlane shuts down?

The local proxy works forever. It's MIT licensed software on your machine. Only the mesh network would stop. You'd keep ~30% savings on static rules.

How do I know I'm actually saving money?

Run relayplane stats or check the dashboard. We show you exactly how much you've saved vs. what you would have spent.

Govern your agents.
Own the trust layer.

Install RelayPlane. Tenant isolation, spec-match, kill-switch — all in 3 minutes.

npm install -g @relayplane/proxy && relayplane init && relayplane start