Why I Built a Lightweight AI Proxy and What I Learned
It started with a $340 bill I didn't see coming.
I was building RelayPlane, an AI proxy for Node.js developers, and using Claude extensively to do it. Agents running tasks, writing code, reviewing PRs, drafting content. All hitting the Anthropic API directly. No visibility, no caps, no routing logic. Just API keys and hope.
Then the invoice landed. I could account for maybe half of it. The rest was somewhere in the logs, probably a rogue agent that ran in a loop, probably fixable, but I had no way to know for sure.
That moment was the whole reason I built a proxy layer in the first place. Let me explain what I learned building it, because most of the interesting stuff wasn't in the code.
The obvious solution was wrong
My first instinct was to add cost tracking to each application. Wrap every API call, log the token counts, tally it up in a database. Reasonable approach. Also completely impractical.
I was running four different services, each with their own API calls. Adding tracking to each one meant duplicating the logic four times, then maintaining it four times as the providers changed their pricing. Every new model release meant updating four codebases.
The proxy layer is the obvious answer once you see it. One place. All traffic flows through it. You track, route, and enforce limits once, and every service downstream benefits automatically.
This is a well-understood pattern in infrastructure. We do it with databases (connection poolers), with HTTP (reverse proxies, load balancers). AI APIs are just another case where the direct connection works until it doesn't.
Routing complexity is the real problem
Cost was the symptom. The underlying problem was routing.
When you're using a single model for everything, you don't need a proxy. But the moment you're running multiple models, you're making routing decisions constantly, mostly implicitly:
- Which model handles code review vs creative writing vs data extraction?
- When one provider rate-limits you, what happens next?
- Do you want to pay Opus-level prices for a task that Haiku handles fine?
Without a proxy, these decisions live scattered across your application code. A check here, a fallback there, a hardcoded model name in a config file that someone updates and forgets to update everywhere else.
With a proxy, routing is configuration. You define the rules once. The proxy enforces them.
The complexity model that's worked best for me: classify requests by complexity, route accordingly. Simple summarization tasks and quick lookups go to faster, cheaper models. Complex reasoning and code generation go to the flagship models. You define the thresholds, the proxy routes. Your costs drop materially without you changing a single model call in your application.
What I got wrong the first time
I over-engineered the storage layer.
The first version had a full Postgres schema for cost tracking, request logs, everything. Proper normalized tables, indexes, the works. It took two weeks to build and I ripped it out a month later.
SQLite is fine for local proxy workloads. The read/write patterns for a proxy are simple: write one record per request, read aggregates for dashboards. You don't need Postgres for that. You need Postgres when you're running a multi-tenant cloud service with thousands of concurrent users. For a local dev proxy, SQLite handles it without breaking a sweat and it's infinitely easier to ship.
If you're building infrastructure tooling, start with the simplest storage that works. You can always swap it out. The cost of complexity is high and you usually can't see it until you've already paid it.
The npm-native bet
There were already proxy options when I started building: LiteLLM is mature and has enormous model coverage. Bifrost is fast and enterprise-focused. Both are good at what they do.
Neither of them is npm install -g @relayplane/proxy and you're running in 30 seconds.
If you're a Node.js developer, every tool that requires Python infrastructure or a Go binary is friction. It's not insurmountable friction. But it's friction. You're adding a runtime dependency for your tooling layer, configuring it, maintaining it.
I wanted something that fits into a Node.js workflow the way a regular npm package does. Install it. Configure it with a JSON file. It runs as a process, same as anything else in your stack.
That's the bet RelayPlane makes. Not the widest model coverage. Not the fastest throughput at 10k RPS. The simplest path from zero to a working AI proxy for developers who are already living in the Node.js ecosystem.
What actually surprised me
I expected the hard parts to be technical. Circuit breakers, retry logic, the request parsing, keeping up with provider API changes.
The hard part was behavioral.
When you add a proxy that shows you what things cost in real time, you change how you use the API. I started making better decisions about which model to call for what. I started noticing patterns in what was expensive and asking whether it needed to be. I started catching agent loops before they became $340 surprises.
Visibility changes behavior. This sounds obvious. But the actual experience of watching a dashboard update in real time while an agent runs, seeing the cost accumulate, being able to kill it if something looks wrong, that's different from knowing abstractly that you should care about costs.
The proxy isn't just infrastructure. It's a feedback loop.
What I'd do differently
Start simpler. The right first version of RelayPlane was a request logger with basic token counting. Instead I built a full routing engine, cost analytics, multiple fallback strategies, all at once. Took three times longer than it needed to.
Also: use your own tool from day one. I built RelayPlane for weeks before routing my own Claude calls through it. The moment I did, I found three bugs in the first hour. The bugs I found myself were trivial to fix. The bugs that would've found my users would not have been.
If you're running AI workloads in Node.js and paying directly for API access, a proxy layer is worth the 30 minutes it takes to set up. The ROI from cost visibility alone usually pays for the setup time in the first week.
If you want to try RelayPlane:
npm install -g @relayplane/proxy
relayplane init
relayplane startDocs and source: github.com/turleydesigns/relayplane-workflows
npm: npmjs.com/package/@relayplane/proxy
Matt Turley is a solo developer building RelayPlane. He writes about AI infrastructure, agent workflows, and building in public.