Building a Node.js OpenAI Proxy in 2026: DIY vs @relayplane/proxy
Every team running AI in production eventually hits the same wall: the raw OpenAI client is fine for demos, but production needs auth, retries, logging, cost tracking, and fallback logic. So you build a proxy. Or you think you're going to, until you look at what that actually takes.
This post walks through the DIY path, shows where it breaks down, and then shows the 3-line version using @relayplane/proxy.
The DIY Proxy (What It Actually Looks Like)
Most teams start simple. A thin Express server, forward the request, return the response. Then the production issues start rolling in.
Here's what a real proxy ends up looking like once you've fixed the first three production incidents:
import express from 'express';
import fetch from 'node-fetch';
const app = express();
app.use(express.json());
const MAX_RETRIES = 3;
const RETRY_DELAY_MS = 1000;
const ALLOWED_KEYS = new Set(process.env.ALLOWED_API_KEYS?.split(',') ?? []);
function sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
async function forwardWithRetry(body, apiKey, retries = 0) {
const start = Date.now();
let response;
try {
response = await fetch('https://api.openai.com/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${apiKey}`,
},
body: JSON.stringify(body),
});
} catch (err) {
if (retries < MAX_RETRIES) {
console.warn(`[proxy] Network error, retrying (${retries + 1}/${MAX_RETRIES})...`);
await sleep(RETRY_DELAY_MS * (retries + 1));
return forwardWithRetry(body, apiKey, retries + 1);
}
throw err;
}
const latencyMs = Date.now() - start;
if (response.status === 429 && retries < MAX_RETRIES) {
const retryAfter = response.headers.get('retry-after');
const delay = retryAfter ? parseInt(retryAfter) * 1000 : RETRY_DELAY_MS * (retries + 1);
console.warn(`[proxy] Rate limited, retrying after ${delay}ms`);
await sleep(delay);
return forwardWithRetry(body, apiKey, retries + 1);
}
const json = await response.json();
// Basic cost tracking (you'll get this wrong and update it every time OpenAI changes pricing)
const inputTokens = json.usage?.prompt_tokens ?? 0;
const outputTokens = json.usage?.completion_tokens ?? 0;
const cost = (inputTokens * 0.000005) + (outputTokens * 0.000015);
console.log(JSON.stringify({
timestamp: new Date().toISOString(),
model: body.model,
inputTokens,
outputTokens,
costUsd: cost.toFixed(6),
latencyMs,
statusCode: response.status,
}));
return { status: response.status, json };
}
app.post('/v1/chat/completions', async (req, res) => {
const clientKey = req.headers['x-client-key'];
if (!ALLOWED_KEYS.has(clientKey)) {
return res.status(401).json({ error: 'Unauthorized' });
}
try {
const { status, json } = await forwardWithRetry(req.body, process.env.OPENAI_API_KEY);
res.status(status).json(json);
} catch (err) {
console.error('[proxy] Fatal error:', err.message);
res.status(500).json({ error: 'Proxy error', message: err.message });
}
});
app.listen(4000, () => console.log('Proxy running on :4000'));That's 70+ lines, and it's still missing streaming support, a dashboard, multi-provider routing, per-user spend limits, and model fallbacks. The cost math will be wrong every time OpenAI reprices. You'll patch it after the billing surprise.
Some teams get 6 months in before realizing they've built a second product that doesn't generate revenue.
What Goes Wrong
The problems compound fast.
Streaming isn't just a flag. It's a different response format (SSE), and your JSON parser will break on it. You need to pipe the response stream through, handle partial chunks, and still extract token counts for cost tracking.
Cost math is fragile. OpenAI has repriced models multiple times. The right input/output token prices change, and you won't notice until the bill arrives. You need a maintained pricing table, not hardcoded constants.
Multi-provider routing becomes necessary the moment a customer needs Claude or Gemini as a fallback. Now you're maintaining auth patterns, retry logic, and response normalization for 3+ providers. Each one has different rate limit headers, different error shapes, different streaming formats.
A dashboard feels optional until a customer asks "how much are we spending on AI this month per user?" and you realize your logs are in CloudWatch and the answer takes 3 hours to extract.
The 3-Line Version
npm install -g @relayplane/proxy
relayplane init
relayplane startimport OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:4100/v1',
});
// Everything else is identical to your existing OpenAI code
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello' }],
});That's it. Your existing OpenAI SDK code works unchanged. You point baseURL at localhost:4100/v1 and the proxy handles the rest.
What You Get
@relayplane/proxy is version 1.8.10, MIT licensed, and ships with:
Smart model routing. A policy engine routes requests based on complexity. Simple queries go to cheaper models (Sonnet, GPT-4o-mini), complex ones escalate to Opus or GPT-4o. The routing logic is rule-based and configurable via your config file.
Per-request cost tracking. Every request gets a cost estimate logged to SQLite with maintained pricing tables across 11 providers. No more guessing at month-end.
Retries and rate limit handling. Built in. You don't write retry logic.
A local dashboard. Open http://localhost:4100 in your browser and you get a web UI showing spend, latency, model distribution, and error rates. No external service required. For CLI stats, run relayplane stats.
11 providers. OpenAI, Anthropic, Google Gemini, and more, all behind one endpoint. Fallback routing between providers is config-level, not code-level.
Streaming. Works out of the box.
The Difference in Practice
With the DIY proxy, a new provider means new auth code, new retry handling, new response normalization. With @relayplane/proxy, you update a config line.
With the DIY proxy, cost tracking breaks when prices change. With @relayplane/proxy, the pricing tables are maintained in the package.
With the DIY proxy, you debug production issues by grepping logs. With @relayplane/proxy, you open the dashboard.
The DIY proxy is a reasonable thing to build once. Building it three times across three products is where teams start looking for alternatives.
Get Started
npm install -g @relayplane/proxy
relayplane init
relayplane startDocs and source at relayplane.com. The package is MIT licensed, so you can inspect exactly what's proxying your requests.
If you're already maintaining a handwritten proxy and it's becoming a liability, this is a direct drop-in. Point your OpenAI client at localhost:4100/v1 and the switch takes about 5 minutes.
Matt Turley is a solo developer building RelayPlane. He writes about AI infrastructure, agent workflows, and building in public.