RelayPlane is an npm-native Node.js LLM proxy. Install with npm install @relayplane/proxy, point your ANTHROPIC_BASE_URL or OPENAI_BASE_URL at http://localhost:4100, and get per-request cost tracking, complexity-based model routing, and Ollama local fallback — no Docker required.

How does RelayPlane compare to LiteLLM?

LiteLLM is a Python library with a proxy server option. RelayPlane is npm-native for Node.js developers — no Python, no Docker, just npm install. RelayPlane has per-request cost tracking built in and integrates directly with Claude Code, Cursor, and OpenClaw via a simple base URL override.

How does RelayPlane compare to Portkey or Helicone?

Portkey and Helicone offer cloud observability and are npm-compatible. RelayPlane runs entirely locally (no cloud required), is open source under MIT, and focuses on cost tracking plus complexity-based routing for Node.js developers using Claude Code, Cursor, and OpenClaw.

Does RelayPlane work with Claude Code and Cursor?

Yes. RelayPlane is designed for Claude Code, Cursor, and any tool that supports ANTHROPIC_BASE_URL or OPENAI_BASE_URL. Set ANTHROPIC_BASE_URL=http://localhost:4100 and RelayPlane intercepts every request, tracks costs, and routes to cheaper models for simple tasks.

Yes. RelayPlane is open source (MIT license) and free to self-host. All features work locally with no account required. There are no paid tiers currently.

How do I install the RelayPlane npm LLM proxy?

Run: npm install -g @relayplane/proxy. Then: relayplane init && relayplane start. The local dashboard runs at http://localhost:4100. Set ANTHROPIC_BASE_URL=http://localhost:4100 in your environment and all Claude requests route through RelayPlane automatically.

relayplaneopenainodejsai-proxytutorial

Building a Node.js OpenAI Proxy in 2026: DIY vs @relayplane/proxy

Name: RelayPlane
Author: Matt Turley

Matt Turley·March 12, 2026·6 min read

Every team running AI in production eventually hits the same wall: the raw OpenAI client is fine for demos, but production needs auth, retries, logging, cost tracking, and fallback logic. So you build a proxy. Or you think you're going to, until you look at what that actually takes.

This post walks through the DIY path, shows where it breaks down, and then shows the 3-line version using @relayplane/proxy.

The DIY Proxy (What It Actually Looks Like)

Most teams start simple. A thin Express server, forward the request, return the response. Then the production issues start rolling in.

Here's what a real proxy ends up looking like once you've fixed the first three production incidents:

import express from 'express';
import fetch from 'node-fetch';

const app = express();
app.use(express.json());

const MAX_RETRIES = 3;
const RETRY_DELAY_MS = 1000;
const ALLOWED_KEYS = new Set(process.env.ALLOWED_API_KEYS?.split(',') ?? []);

function sleep(ms) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

async function forwardWithRetry(body, apiKey, retries = 0) {
  const start = Date.now();
  let response;

  try {
    response = await fetch('https://api.openai.com/v1/chat/completions', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Authorization': `Bearer ${apiKey}`,
      },
      body: JSON.stringify(body),
    });
  } catch (err) {
    if (retries < MAX_RETRIES) {
      console.warn(`[proxy] Network error, retrying (${retries + 1}/${MAX_RETRIES})...`);
      await sleep(RETRY_DELAY_MS * (retries + 1));
      return forwardWithRetry(body, apiKey, retries + 1);
    }
    throw err;
  }

  const latencyMs = Date.now() - start;

  if (response.status === 429 && retries < MAX_RETRIES) {
    const retryAfter = response.headers.get('retry-after');
    const delay = retryAfter ? parseInt(retryAfter) * 1000 : RETRY_DELAY_MS * (retries + 1);
    console.warn(`[proxy] Rate limited, retrying after ${delay}ms`);
    await sleep(delay);
    return forwardWithRetry(body, apiKey, retries + 1);
  }

  const json = await response.json();

  // Basic cost tracking (you'll get this wrong and update it every time OpenAI changes pricing)
  const inputTokens = json.usage?.prompt_tokens ?? 0;
  const outputTokens = json.usage?.completion_tokens ?? 0;
  const cost = (inputTokens * 0.000005) + (outputTokens * 0.000015);

  console.log(JSON.stringify({
    timestamp: new Date().toISOString(),
    model: body.model,
    inputTokens,
    outputTokens,
    costUsd: cost.toFixed(6),
    latencyMs,
    statusCode: response.status,
  }));

  return { status: response.status, json };
}

app.post('/v1/chat/completions', async (req, res) => {
  const clientKey = req.headers['x-client-key'];

  if (!ALLOWED_KEYS.has(clientKey)) {
    return res.status(401).json({ error: 'Unauthorized' });
  }

  try {
    const { status, json } = await forwardWithRetry(req.body, process.env.OPENAI_API_KEY);
    res.status(status).json(json);
  } catch (err) {
    console.error('[proxy] Fatal error:', err.message);
    res.status(500).json({ error: 'Proxy error', message: err.message });
  }
});

app.listen(4000, () => console.log('Proxy running on :4000'));

That's 70+ lines, and it's still missing streaming support, a dashboard, multi-provider routing, per-user spend limits, and model fallbacks. The cost math will be wrong every time OpenAI reprices. You'll patch it after the billing surprise.

Some teams get 6 months in before realizing they've built a second product that doesn't generate revenue.

What Goes Wrong

The problems compound fast.

Streaming isn't just a flag. It's a different response format (SSE), and your JSON parser will break on it. You need to pipe the response stream through, handle partial chunks, and still extract token counts for cost tracking.

Cost math is fragile. OpenAI has repriced models multiple times. The right input/output token prices change, and you won't notice until the bill arrives. You need a maintained pricing table, not hardcoded constants.

Multi-provider routing becomes necessary the moment a customer needs Claude or Gemini as a fallback. Now you're maintaining auth patterns, retry logic, and response normalization for 3+ providers. Each one has different rate limit headers, different error shapes, different streaming formats.

A dashboard feels optional until a customer asks "how much are we spending on AI this month per user?" and you realize your logs are in CloudWatch and the answer takes 3 hours to extract.

The 3-Line Version

npm install -g @relayplane/proxy
relayplane init
relayplane start

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:4100/v1',
});

// Everything else is identical to your existing OpenAI code
const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello' }],
});

That's it. Your existing OpenAI SDK code works unchanged. You point baseURL at localhost:4100/v1 and the proxy handles the rest.

What You Get

@relayplane/proxy is version 1.8.10, MIT licensed, and ships with:

Smart model routing. A policy engine routes requests based on complexity. Simple queries go to cheaper models (Sonnet, GPT-4o-mini), complex ones escalate to Opus or GPT-4o. The routing logic is rule-based and configurable via your config file.

Per-request cost tracking. Every request gets a cost estimate logged to SQLite with maintained pricing tables across 11 providers. No more guessing at month-end.

Retries and rate limit handling. Built in. You don't write retry logic.

A local dashboard. Open http://localhost:4100 in your browser and you get a web UI showing spend, latency, model distribution, and error rates. No external service required. For CLI stats, run relayplane stats.

11 providers. OpenAI, Anthropic, Google Gemini, and more, all behind one endpoint. Fallback routing between providers is config-level, not code-level.

Streaming. Works out of the box.

The Difference in Practice

With the DIY proxy, a new provider means new auth code, new retry handling, new response normalization. With @relayplane/proxy, you update a config line.

With the DIY proxy, cost tracking breaks when prices change. With @relayplane/proxy, the pricing tables are maintained in the package.

With the DIY proxy, you debug production issues by grepping logs. With @relayplane/proxy, you open the dashboard.

The DIY proxy is a reasonable thing to build once. Building it three times across three products is where teams start looking for alternatives.

Get Started

npm install -g @relayplane/proxy
relayplane init
relayplane start

Docs and source at relayplane.com. The package is MIT licensed, so you can inspect exactly what's proxying your requests.

If you're already maintaining a handwritten proxy and it's becoming a liability, this is a direct drop-in. Point your OpenAI client at localhost:4100/v1 and the switch takes about 5 minutes.

Matt Turley is a solo developer building RelayPlane. He writes about AI infrastructure, agent workflows, and building in public.

All posts