AI Agent Proxy: How to Track and Control LLM Costs Per Agent in Node.js
You gave your agent a credit card and looked away. That is essentially what happens when you run autonomous AI agents without an AI agent proxy sitting between them and your LLM provider. Claude Code, Codex, custom LangChain loops, AutoGen swarms — they all make API calls on your behalf. And unless something is counting those calls, classifying them, and enforcing limits, you will find out the cost at the end of the month.
Sometimes the discovery is $40. Sometimes it is $500. If you have ever had an agent loop stuck in a retry spiral while you slept, you know the feeling.
The Runaway Agent Cost Problem
The core issue is not that agents are expensive. It is that they are invisible.
A human developer opens a terminal, types a prompt, reads the response. There is friction built in. An autonomous agent has no friction. It fires requests in tight loops, retries on error, cascades one task into five subtasks, and every step costs money with no natural pause point.
The failure modes are predictable:
- Retry loops. Agent hits a 529 overloaded response and retries immediately. Then again. Then 47 more times.
- Task explosion. A single "write me a report" spawns planning calls, research calls, drafting calls, revision calls. A $0.05 task runs $2.80.
- Wrong model for the job. Your agent defaults to Opus for everything. File reads, simple lookups, boilerplate — all hitting the most expensive model available.
You need a layer that sees all of this in real time. That is what an AI agent proxy does.
What Per-Agent Cost Tracking Looks Like in Practice
An AI agent proxy sits between your agent and the LLM API. Every request passes through it. The proxy records the model, the token counts, the cost, the latency, and the outcome. This alone is useful, but it is not enough.
Claude Code cost tracking is a good example of why attribution matters. Claude Code runs multiple subagents in a session: a planner, a coder, a reviewer. Without per-agent tracking, you see one lump sum. With an AI agent proxy doing system prompt fingerprinting, you see exactly which subagent spent what.
If you run five agents, you need to know which one is spending money, not just that money is being spent.
RelayPlane solves this with system prompt fingerprinting. Here is how it works:
Every agent has a system prompt. It might be "You are a senior code reviewer" or a 2,000-token instruction set defining an entire persona. The proxy hashes that system prompt on first sight and assigns it an internal agent ID. From that point forward, every request carrying that fingerprint is tracked under the same agent.
No configuration required. No SDK changes. No agent-side code.
The result: you open the local dashboard at localhost:4100 and see a breakdown by agent. Agent A (code reviewer) spent $1.20 today. Agent B (document summarizer) spent $0.08. Agent C (research loop) spent $9.40 in the last two hours and triggered an anomaly alert.
That third one is the one that would have cost you $500 by morning without intervention.
Budget Enforcement and Anomaly Detection
Per-agent tracking is the visibility layer. The enforcement layer is what actually protects you.
RelayPlane ships three enforcement mechanisms, all live in the current release:
Budget caps set daily, hourly, or per-request limits. When an agent hits the cap, the proxy blocks further requests and returns an error. The agent cannot keep spending because it did not notice the limit.
Anomaly detection watches for cost spikes in short windows. If an agent sends 12 requests to Opus in 30 seconds ($4.20), the pattern triggers an alert configured to warn, block, or auto-downgrade.
Auto-downgrade is the most useful mechanism for runaway loops. When an agent's spend rate exceeds a threshold, the proxy routes subsequent requests to a cheaper model automatically. The agent keeps running. Your bill stops climbing.
All of this happens transparently. You point your agent at localhost:4100 instead of api.anthropic.com and the proxy handles the rest.
x402 Self-Signup for Agents (Coming Soon)
The pricing page and onboarding flow at RelayPlane are currently built for humans: browser, credit card, Stripe checkout. That works fine for developers. It breaks down for fully autonomous agents.
An agent running in a CI pipeline or a headless container cannot click through a browser. It cannot enter a credit card number. If the agent decides it wants to upgrade to access more features, it needs a machine-to-machine payment path.
That path is the x402 protocol, a proposed HTTP standard where a server returns 402 Payment Required and the client pays with a small on-chain USDC transaction to receive an API key. No human involved.
RelayPlane is building this signup path now. When it ships, an agent will be able to hit the upgrade endpoint, pay $19 USDC via x402, receive a Pro API key, and configure it locally. No email, no password, no Stripe session. The agent upgrades itself.
RelayPlane Fleet Tier for Agentic Workloads
The Free tier gives you the full proxy with all local features: per-agent tracking, budget enforcement, anomaly detection, auto-downgrade, response caching, cost estimation. No artificial limits on those features.
The Fleet tier at $49/month is built for the next step: running multiple agents that should be learning from each other.
Fleet includes:
- Private fleet mesh — your agents share routing insights with each other across machines
- Cross-agent learning — routing discoveries propagate to your other agents automatically
- Fleet-wide budget allocation — set a total budget across all your agents, not just per-agent caps
- Shared response cache — if agent A already fetched a result, agent B does not pay to fetch it again
- Fleet management API — monitor and configure your entire agent fleet from one endpoint
- Unlimited history with cloud sync — Free tier keeps 7 days locally, Fleet keeps everything with cloud backup
Pro at $19/month sits between the two: 90-day history, personalized cost estimates trained on aggregate network data, and real-time provider health from the collective mesh.
Getting Started in 3 Lines
Install the proxy globally:
npm install -g @relayplane/proxyStart it:
relayplane startPoint your agent at it by setting the base URL:
// For Anthropic SDK
const client = new Anthropic({
baseURL: "http://localhost:4100",
});
// For OpenAI SDK (also supported)
const client = new OpenAI({
baseURL: "http://localhost:4100/openai",
apiKey: process.env.OPENAI_API_KEY,
});That is it. Your existing agent code does not change. The proxy intercepts every request, tracks costs by agent fingerprint, enforces any budgets you have configured, and surfaces everything in the local dashboard.
To set a budget:
# Block requests after $5/day total
relayplane config set budget.daily 5.00
# Warn at $3, block at $5
relayplane config set budget.warn 3.00
relayplane config set budget.block 5.00To enable anomaly detection:
relayplane config set anomaly.enabled true
relayplane config set anomaly.threshold 2.00
relayplane config set anomaly.window 60The proxy logs to ~/.relayplane/telemetry.jsonl immediately. Open localhost:4100 to see the dashboard.
An AI agent proxy is not optional monitoring. It is the thing standing between your agents and an uncapped credit card. Agent LLM cost management belongs in the infrastructure stack from day one, not bolted on after the first surprise bill.
The free tier of RelayPlane gives you enough to catch the $500 surprises. The paid tiers give you the infrastructure to run agents at scale.
npm install -g @relayplane/proxy && relayplane startThen check localhost:4100. You might be surprised what your agents have been spending while you were not looking.