RelayPlane is an npm-native Node.js LLM proxy. Install with npm install @relayplane/proxy, point your ANTHROPIC_BASE_URL or OPENAI_BASE_URL at http://localhost:4100, and get per-request cost tracking, complexity-based model routing, and Ollama local fallback, no Docker required.

How does RelayPlane compare to LiteLLM?

LiteLLM is a Python library with a proxy server option. RelayPlane is npm-native for Node.js developers, no Python, no Docker, just npm install. RelayPlane has per-request cost tracking built in and integrates directly with Claude Code, Cursor, and OpenClaw via a simple base URL override.

How does RelayPlane compare to Portkey or Helicone?

Portkey and Helicone offer cloud observability and are npm-compatible. RelayPlane runs entirely locally (no cloud required), is open source under MIT, and focuses on cost tracking plus complexity-based routing for Node.js developers using Claude Code, Cursor, and OpenClaw.

Does RelayPlane work with Claude Code and Cursor?

Yes. RelayPlane is designed for Claude Code, Cursor, and any tool that supports ANTHROPIC_BASE_URL or OPENAI_BASE_URL. Set ANTHROPIC_BASE_URL=http://localhost:4100 and RelayPlane intercepts every request, tracks costs, and routes to cheaper models for simple tasks.

The core RelayPlane proxy and local dashboard are free forever. Open source (MIT), runs locally on your machine, your keys and prompts never leave your box. Pro ($19/mo) adds network routing intelligence and 90-day history. Max ($99/mo) adds team coordination, audit logs, and approval gates. See relayplane.com/pricing for the full breakdown.

How do I install the RelayPlane npm LLM proxy?

Run: npm install -g @relayplane/proxy. Then: relayplane init && relayplane start. The local dashboard runs at http://localhost:4100. Set ANTHROPIC_BASE_URL=http://localhost:4100 in your environment and all Claude requests route through RelayPlane automatically.

relayplanemcpclaude-codeai-agentscost-control

Introducing @relayplane/mcp-server: Budget-Aware AI Routing via MCP

Name: RelayPlane
Author: Matt Turley

Matt Turley·March 10, 2026·5 min read

Last November I woke up to a $340 Anthropic bill. I run AI agents for pretty much everything in my business. Code generation, content, research. I'd left a coding agent on a refactoring task overnight and it got stuck in a retry loop. Three hundred and forty dollars of tokens, burned on nothing useful.

I couldn't even tell which agent caused it. Anthropic's billing page showed me the damage after it was done. There was no way for the agent to ask “how much have I spent?” before making the next call.

That's why I built RelayPlane. And today we're shipping the piece that makes it all click: @relayplane/mcp-server v1.0.0.

What's MCP?

MCP (Model Context Protocol) is a standard that lets you give tools to AI agents. Your agent discovers the tools, calls them when it needs to, and gets structured results back. Think of it like a plugin system for LLMs.

What the MCP Server Does

The RelayPlane MCP server gives your agent two things it doesn't have by default: cost visibility on every LLM call, and hard budget limits that block runaway spending before it happens.

Here are the tools that matter:

relay_run sends a prompt to any supported provider (OpenAI, Anthropic, Google, xAI) and returns the result with token usage and estimated cost attached. Every single call. No delay, no checking a dashboard later.

{
  "output": "I found 3 potential issues in the diff...",
  "model": "anthropic:claude-sonnet-4-5-20250929",
  "usage": {
    "promptTokens": 1847,
    "completionTokens": 632,
    "estimatedProviderCostUsd": 0.012
  }
}

relay_runs_list shows the agent its recent spending. It can see exactly how much each call cost and decide whether to keep using Sonnet or drop to Haiku for simpler tasks.

relay_models_list gives the agent a catalog of every available model with per-token pricing. The agent can compare costs and pick the cheapest model that fits.

relay_workflow_run chains multiple steps into a workflow DAG. Intermediate results stay in the engine instead of bloating your context window. Less context usage, lower costs.

Budget enforcement is baked into the server itself. You set limits once, and every relay_run gets checked automatically:

Limit	Default	Flag
Daily spending	$5.00	`--max-daily-cost`
Per-call cost	$0.50	`--max-single-call-cost`
Hourly requests	100	`--max-calls-per-hour`

When a call would blow the budget, the server blocks it and tells the agent why. No tokens burned. No surprise bills.

Setup in 60 Seconds

One command for Claude Code:

claude mcp add relayplane \
  -e OPENAI_API_KEY=sk-... \
  -e ANTHROPIC_API_KEY=sk-ant-... \
  -- npx @relayplane/mcp-server

Restart Claude Code (full restart, not /mcp), and you're live.

Works the same way in Cursor and Windsurf. Add the MCP server config, pass your API keys, restart. Your agent gets budget-aware routing across all your providers immediately.

For manual config, drop this in ~/.claude.json:

{
  "mcpServers": {
    "relayplane": {
      "type": "stdio",
      "command": "npx",
      "args": ["@relayplane/mcp-server"],
      "env": {
        "OPENAI_API_KEY": "sk-proj-...",
        "ANTHROPIC_API_KEY": "sk-ant-..."
      }
    }
  }
}

Why This Changes How Agents Work

Here's the thing I didn't expect. When you give an agent cost data on every response, it starts self-regulating. An agent that sees $0.068 next to a GPT-5.2 call and $0.002 next to a GPT-5 Mini call will naturally route simpler tasks to the cheaper model. You don't need complex routing rules. You just need the numbers.

Before this, my workflow was: run agents, check the Anthropic dashboard the next morning, feel bad about the number, change nothing.

Now: agents route through RelayPlane, every response includes cost, budget limits block runaway spending, and I actually sleep.

What's Next

This is v1.0.0. It works, it's stable, and it solves the problem I built it for. On the roadmap:

Real-time budget alerts that notify you (not just block) when spending hits thresholds
Multi-agent budget sharing so a team of agents can share a single daily cap
Pre-built workflow skills for common patterns (we ship three today: invoice processing, content pipelines, lead enrichment)

RelayPlane is MIT licensed. BYOK (bring your own keys), so you're paying your providers directly. We don't add markup.

npx @relayplane/mcp-server

npm · GitHub · Docs

If you've been running agents without budget visibility, try it for a week. The first time you see an agent voluntarily pick a cheaper model because it checked its own spending, you'll get it.

No more $340 surprises.

All posts