RelayPlane is an npm-native Node.js LLM proxy. Install with npm install @relayplane/proxy, point your ANTHROPIC_BASE_URL or OPENAI_BASE_URL at http://localhost:4100, and get per-request cost tracking, complexity-based model routing, and Ollama local fallback, no Docker required.

How does RelayPlane compare to LiteLLM?

LiteLLM is a Python library with a proxy server option. RelayPlane is npm-native for Node.js developers, no Python, no Docker, just npm install. RelayPlane has per-request cost tracking built in and integrates directly with Claude Code, Cursor, and OpenClaw via a simple base URL override.

How does RelayPlane compare to Portkey or Helicone?

Portkey and Helicone offer cloud observability and are npm-compatible. RelayPlane runs entirely locally (no cloud required), is open source under MIT, and focuses on cost tracking plus complexity-based routing for Node.js developers using Claude Code, Cursor, and OpenClaw.

Does RelayPlane work with Claude Code and Cursor?

Yes. RelayPlane is designed for Claude Code, Cursor, and any tool that supports ANTHROPIC_BASE_URL or OPENAI_BASE_URL. Set ANTHROPIC_BASE_URL=http://localhost:4100 and RelayPlane intercepts every request, tracks costs, and routes to cheaper models for simple tasks.

Yes, completely. RelayPlane is free and open source (MIT). Every feature is included: full proxy, local and cloud dashboard, all 11 providers, budget enforcement, anomaly detection, hard cost caps, auto-kill, and 90-day history. It runs locally on your machine, your keys and prompts never leave your box. No tiers, no paywalls, no credit card. See relayplane.com/pricing.

How do I install the RelayPlane npm LLM proxy?

Run: npm install -g @relayplane/proxy. Then: relayplane init && relayplane start. The local dashboard runs at http://localhost:4100. Set ANTHROPIC_BASE_URL=http://localhost:4100 in your environment and all Claude requests route through RelayPlane automatically.

relayplaneai-proxycost-trackingautonomous-agents

AI Agent Proxy Cost Tracking: Stop Paying Surprise LLM Bills

Name: RelayPlane
Author: Matt Turley

Matt Turley·March 16, 2026·7 min read

You gave your agent a credit card and looked away. That is essentially what happens when you run autonomous AI agents without an AI agent proxy sitting between them and your LLM provider. Claude Code, Codex, custom LangChain loops, AutoGen swarms: they all make API calls on your behalf. And unless something is counting those calls, classifying them, and enforcing limits, you will find out the cost at the end of the month.

Sometimes the discovery is $40. Sometimes it is $500. If you have ever had an agent loop stuck in a retry spiral while you slept, you know the feeling.

The Runaway Agent Cost Problem

The core issue is not that agents are expensive. It is that they are invisible.

A human developer opens a terminal, types a prompt, reads the response. There is friction built in. An autonomous agent has no friction. It fires requests in tight loops, retries on error, cascades one task into five subtasks, and every step costs money with no natural pause point.

The failure modes are predictable:

Retry loops. Agent hits a 529 overloaded response and retries immediately. Then again. Then 47 more times.
Task explosion. A single "write me a report" spawns planning calls, research calls, drafting calls, revision calls. A $0.05 task runs $2.80.
Wrong model for the job. Your agent defaults to Opus for everything. File reads, simple lookups, boilerplate: all hitting the most expensive model available.

You need a layer that sees all of this in real time. That is what an AI agent proxy does.

What Per-Agent Cost Tracking Looks Like in Practice

An AI agent proxy sits between your agent and the LLM API. Every request passes through it. The proxy records the model, the token counts, the cost, the latency, and the outcome. This alone is useful, but it is not enough.

Claude Code cost tracking is a good example of why attribution matters. Claude Code runs multiple subagents in a session: a planner, a coder, a reviewer. Without per-agent tracking, you see one lump sum. With an AI agent proxy doing system prompt fingerprinting, you see exactly which subagent spent what.

If you run five agents, you need to know which one is spending money, not just that money is being spent.

RelayPlane solves this with system prompt fingerprinting. Here is how it works:

Every agent has a system prompt. It might be "You are a senior code reviewer" or a 2,000-token instruction set defining an entire persona. The proxy hashes that system prompt on first sight and assigns it an internal agent ID. From that point forward, every request carrying that fingerprint is tracked under the same agent.

No configuration required. No SDK changes. No agent-side code.

The result: you open the local dashboard at localhost:4100 and see a breakdown by agent. Agent A (code reviewer) spent $1.20 today. Agent B (document summarizer) spent $0.08. Agent C (research loop) spent $9.40 in the last two hours and triggered an anomaly alert.

That third one is the one that would have cost you $500 by morning without intervention.

Budget Enforcement and Anomaly Detection

Per-agent tracking is the visibility layer. The enforcement layer is what actually protects you.

RelayPlane ships three enforcement mechanisms, all live in the current release:

Budget caps set daily, hourly, or per-request limits. When an agent hits the cap, the proxy blocks further requests and returns an error. The agent cannot keep spending because it did not notice the limit.

Anomaly detection watches for cost spikes in short windows. If an agent sends 12 requests to Opus in 30 seconds ($4.20), the pattern triggers an alert configured to warn, block, or auto-downgrade.

Auto-downgrade is the most useful mechanism for runaway loops. When an agent's spend rate exceeds a threshold, the proxy routes subsequent requests to a cheaper model automatically. The agent keeps running. Your bill stops climbing.

All of this happens transparently. You point your agent at localhost:4100 instead of api.anthropic.com and the proxy handles the rest.

x402 Self-Signup for Agents (Coming Soon)

The pricing page and onboarding flow at RelayPlane are currently built for humans: browser, credit card, Stripe checkout. That works fine for developers. It breaks down for fully autonomous agents.

An agent running in a CI pipeline or a headless container cannot click through a browser. It cannot enter a credit card number. If the agent decides it wants to upgrade to access more features, it needs a machine-to-machine payment path.

That path is the x402 protocol, a proposed HTTP standard where a server returns 402 Payment Required and the client pays with a small on-chain USDC transaction to receive an API key. No human involved.

RelayPlane is building this signup path now. When it ships, an agent will be able to hit the upgrade endpoint, pay $19 USDC via x402, receive a Pro API key, and configure it locally. No email, no password, no Stripe session. The agent upgrades itself.

RelayPlane Fleet Tier for Agentic Workloads

The Free tier gives you the full proxy with all local features: per-agent tracking, budget enforcement, anomaly detection, auto-downgrade, response caching, cost estimation. No artificial limits on those features.

The Fleet tier at $49/month is built for the next step: running multiple agents that should be learning from each other.

Fleet includes:

Private fleet mesh: your agents share routing insights with each other across machines
Cross-agent learning: routing discoveries propagate to your other agents automatically
Fleet-wide budget allocation: set a total budget across all your agents, not just per-agent caps
Shared response cache: if agent A already fetched a result, agent B does not pay to fetch it again
Fleet management API: monitor and configure your entire agent fleet from one endpoint
Unlimited history with cloud sync: Free tier keeps 7 days locally, Fleet keeps everything with cloud backup

Pro at $19/month sits between the two: 90-day history, personalized cost estimates trained on aggregate network data, and real-time provider health from the collective mesh.

Getting Started in 3 Lines

Install the proxy globally:

npm install -g @relayplane/proxy

Initialize it:

relayplane init

Start it:

relayplane start

Point your agent at it by setting the base URL:

// For Anthropic SDK
const client = new Anthropic({
  baseURL: "http://localhost:4100",
});

// For OpenAI SDK (also supported)
const client = new OpenAI({
  baseURL: "http://localhost:4100/openai",
  apiKey: process.env.OPENAI_API_KEY,
});

That is it. Your existing agent code does not change. The proxy intercepts every request, tracks costs by agent fingerprint, enforces any budgets you have configured, and surfaces everything in the local dashboard.

To set a budget:

# Block requests after $5/day total
relayplane config set budget.daily 5.00

# Warn at $3, block at $5
relayplane config set budget.warn 3.00
relayplane config set budget.block 5.00

To enable anomaly detection:

relayplane config set anomaly.enabled true
relayplane config set anomaly.threshold 2.00
relayplane config set anomaly.window 60

The proxy logs to ~/.relayplane/telemetry.jsonl immediately. Open localhost:4100 to see the dashboard.

An AI agent proxy is not optional monitoring. It is the thing standing between your agents and an uncapped credit card. Agent LLM cost management belongs in the infrastructure stack from day one, not bolted on after the first surprise bill.

The free tier of RelayPlane gives you enough to catch the $500 surprises. The paid tiers give you the infrastructure to run agents at scale.

npm install -g @relayplane/proxy && relayplane init && relayplane start

Then check localhost:4100. You might be surprised what your agents have been spending while you were not looking.

All posts