Introducing @relayplane/mcp-server: Budget-Aware AI Routing via MCP
Last November I woke up to a $340 Anthropic bill. I run AI agents for pretty much everything in my business. Code generation, content, research. I'd left a coding agent on a refactoring task overnight and it got stuck in a retry loop. Three hundred and forty dollars of tokens, burned on nothing useful.
I couldn't even tell which agent caused it. Anthropic's billing page showed me the damage after it was done. There was no way for the agent to ask “how much have I spent?” before making the next call.
That's why I built RelayPlane. And today we're shipping the piece that makes it all click: @relayplane/mcp-server v1.0.0.
What's MCP?
MCP (Model Context Protocol) is a standard that lets you give tools to AI agents. Your agent discovers the tools, calls them when it needs to, and gets structured results back. Think of it like a plugin system for LLMs.
What the MCP Server Does
The RelayPlane MCP server gives your agent two things it doesn't have by default: cost visibility on every LLM call, and hard budget limits that block runaway spending before it happens.
Here are the tools that matter:
relay_run sends a prompt to any supported provider (OpenAI, Anthropic, Google, xAI) and returns the result with token usage and estimated cost attached. Every single call. No delay, no checking a dashboard later.
{
"output": "I found 3 potential issues in the diff...",
"model": "anthropic:claude-sonnet-4-5-20250929",
"usage": {
"promptTokens": 1847,
"completionTokens": 632,
"estimatedProviderCostUsd": 0.012
}
}relay_runs_list shows the agent its recent spending. It can see exactly how much each call cost and decide whether to keep using Sonnet or drop to Haiku for simpler tasks.
relay_models_list gives the agent a catalog of every available model with per-token pricing. The agent can compare costs and pick the cheapest model that fits.
relay_workflow_run chains multiple steps into a workflow DAG. Intermediate results stay in the engine instead of bloating your context window. Less context usage, lower costs.
Budget enforcement is baked into the server itself. You set limits once, and every relay_run gets checked automatically:
| Limit | Default | Flag |
|---|---|---|
| Daily spending | $5.00 | --max-daily-cost |
| Per-call cost | $0.50 | --max-single-call-cost |
| Hourly requests | 100 | --max-calls-per-hour |
When a call would blow the budget, the server blocks it and tells the agent why. No tokens burned. No surprise bills.
Setup in 60 Seconds
One command for Claude Code:
claude mcp add relayplane \
-e OPENAI_API_KEY=sk-... \
-e ANTHROPIC_API_KEY=sk-ant-... \
-- npx @relayplane/mcp-serverRestart Claude Code (full restart, not /mcp), and you're live.
Works the same way in Cursor and Windsurf. Add the MCP server config, pass your API keys, restart. Your agent gets budget-aware routing across all your providers immediately.
For manual config, drop this in ~/.claude.json:
{
"mcpServers": {
"relayplane": {
"type": "stdio",
"command": "npx",
"args": ["@relayplane/mcp-server"],
"env": {
"OPENAI_API_KEY": "sk-proj-...",
"ANTHROPIC_API_KEY": "sk-ant-..."
}
}
}
}Why This Changes How Agents Work
Here's the thing I didn't expect. When you give an agent cost data on every response, it starts self-regulating. An agent that sees $0.068 next to a GPT-5.2 call and $0.002 next to a GPT-5 Mini call will naturally route simpler tasks to the cheaper model. You don't need complex routing rules. You just need the numbers.
Before this, my workflow was: run agents, check the Anthropic dashboard the next morning, feel bad about the number, change nothing.
Now: agents route through RelayPlane, every response includes cost, budget limits block runaway spending, and I actually sleep.
What's Next
This is v1.0.0. It works, it's stable, and it solves the problem I built it for. On the roadmap:
- Real-time budget alerts that notify you (not just block) when spending hits thresholds
- Multi-agent budget sharing so a team of agents can share a single daily cap
- Pre-built workflow skills for common patterns (we ship three today: invoice processing, content pipelines, lead enrichment)
RelayPlane is MIT licensed. BYOK (bring your own keys), so you're paying your providers directly. We don't add markup.
npx @relayplane/mcp-serverIf you've been running agents without budget visibility, try it for a week. The first time you see an agent voluntarily pick a cheaper model because it checked its own spending, you'll get it.
No more $340 surprises.