v1.9.38·Open source·MIT·Credential pool·Quota-aware routing·Anomaly detection·Audit trail

Your Claude bill goes 25x on June 15.Stop it at the proxy.

Anthropic capped programmatic Claude per subscription on June 15. RelayPlane spreads load across multiple credentials, fails over before 429s, and stops runaway agents at the budget cap. One proxy, multi-provider, local by default. One env var, no code changes.

Local by default · No telemetry required·174 GitHub stars·11 providers supported
Live · 4,200 req/min · routed in < 6 msSpend this hour$12.84
agent
request
localhost:4100
simplehaiku$0.10/Mtok
agenticcodex / sonnetplan + overflow
judgmentopus / sonnet$15/Mtok
60% simple22% agentic18% judgment
The June 15 wall

$200 used to last a month. Now it lasts a week.

Programmatic Claude is now metered at full API rates on top of your subscription. The effective cost multiplier across published math runs 12x for light scripting up to 175x for heavy Sonnet fleets. RelayPlane reroutes the same workload across the credential pool, quota-aware fail-over, and cheap-model offload, so the bill stays inside the plan.

Cost multiplier, same workload
Light scripting12x
Background agents25x
Heavy Sonnet fleet175x
Same workload · with RelayPlane1.0x

Multipliers from publicly posted June 15 math (Theo Browne, Magna Capax, HN discussion). RelayPlane band assumes credential pool spans Pro / Max / API keys with quota-aware fail-over enabled.

How RelayPlane changes the math
  • Multi-credential pool: round-robin across Pro, Max, and API keys you already own.
  • Quota-aware router: tracks per-credential usage and shifts traffic before the cap hits.
  • Cheap-model offload: routine work goes to Haiku / Groq / 4o-mini, frontier work stays on Opus.
  • Anomaly detection: velocity, repetition, and token-explosion guards stop runaway bills.
  • 30 second install: ANTHROPIC_BASE_URL=http://localhost:4100
Runaway agents

One stuck agent at API rates is $40 an hour.

Open Claude Code issue #26171: an agent burned 72,900 tokens over 21 minutes in a thinking loop with zero useful output. Pre June 15 that was annoying. After, every minute of that loop is metered at full API rates. RelayPlane catches velocity spikes, token explosions, and repetition loops in a sliding 100-request window and stops the bill at the budget cap.

contoso-eval · runaway loop
Shipped guards in v1.9.38
  • Velocity detection. Sliding window flags req/min spikes before they burn through a budget.
  • Token-explosion guard. Catches the 72k-token loop pattern and blocks the next call.
  • Cost acceleration. If $/min doubles inside the window, requests downgrade or pause based on policy.
  • Hard budget caps. Per-agent, per-tenant, per-day. Action is configurable: block, downgrade, warn, alert.
  • Audit trail. Every halt is checksummed and exportable for incident review.

Tenant kill-switch HTTP endpoint is on the next-2-week roadmap. Anomaly detection and budget caps are live in the local proxy today.

The math

Right task, right model, right rate.

The proxy classifies every request before it leaves your machine. Routine work goes to fast, cheap models. Judgment work goes to the frontier. You stop paying $15/Mtok for tasks that need $0.10.

Task class
Best fit
Rate
Share of your traffic
Code review, judgment, taste
Claude Opus / Sonnet
$3 / $15 per Mtok
~18%
Agentic coding, planning
ChatGPT Codex / Sonnet
plan credit + overflow
~22%
Scoring, dedup, voice checks, classification
Haiku · Groq Llama · GPT-4o-mini
$0.10 to $0.80 per Mtok
~60%
Most agent pipelines are 60 to 80% routine. Routing the routine 60% to a $0.10 model is where the savings live. The frontier still gets the work it deserves.
How it works

Three pillars. Honest about what is live.

Observe is in production. Govern is shipping in pieces (budget caps and anomaly detection today, tenant kill-switch endpoint next). Verify is on the roadmap and runs out of band today.

Pillar 1 · ObserveShipped

See everything, per agent.

Every LLM request flows through the proxy with full attribution. Cost, model, task type, tokens, latency, all live, all namespaced. Per-agent breakdown uses the system-prompt fingerprint, no annotation work required.

  • Per-tenant and per-agent cost tracking
  • Cache-aware accounting (Anthropic prompt caching)
  • Tamper-proof, exportable audit trail
  • 7 days free, 30 days Starter, 90 days Pro, unlimited Max
Pillar 2 · GovernShipping in pieces

Hard budget caps. Anomaly detection.

Daily, hourly, and per-request budget caps with block, downgrade, warn, or alert actions. Velocity spikes, repetition loops, and token explosions are detected in a sliding window. Multi-tenant kill-switch endpoint is next on the roadmap.

  • Budget caps, configurable action per breach (live)
  • Anomaly detection across the 100-req window (live)
  • Credential pool, round-robin across keys (live)
  • Quota-aware fail-over before 429s (live, Pro tier)
  • Tenant pause via HTTP endpoint (spec, next 2 weeks)
Pillar 3 · VerifyOn the roadmap

Spec-match before it ships.

Before an agent marks a task done, RelayPlane will score the diff and acceptance criteria with a cheap judge model. Failing tasks retry. Today this lives as a separate evaluator in the clawd pipeline. Moving it into RelayPlane is on deck.

  • Per-criterion pass / fail with evidence
  • Blocker, major, minor severity weighting
  • Judge model configurable, Haiku by default
  • Currently runs in clawd, RP integration on the roadmap
The receipts

Built on pain we measured.

The numbers below are real and current, not aspirations. The proxy version is the live npm tag, the stars are pulled from GitHub at build time, the providers are the ones we wire to in v1.9.38 today.

v1.9.38
Live on npm
@relayplane/proxy
174
GitHub stars
RelayPlane/proxy
11
Providers supported
Anthropic to local
77%
Documented cost cut
on routine-heavy pipelines
Get started

One command. Three minutes.

Works with Claude Code, Cursor, OpenClaw, and any agent that supports ANTHROPIC_BASE_URL or OPENAI_BASE_URL. Point your agent at localhost:4100 and you are done.

npm install -g @relayplane/proxy && relayplane init && relayplane start
Anthropic · OpenAI · Google Gemini · xAI / Grok · OpenRouter · DeepSeek · Groq · Mistral · Together · Fireworks · Perplexity