RelayPlane is an npm-native Node.js LLM proxy. Install with npm install @relayplane/proxy, point your ANTHROPIC_BASE_URL or OPENAI_BASE_URL at http://localhost:4100, and get per-request cost tracking, complexity-based model routing, and Ollama local fallback, no Docker required.

How does RelayPlane compare to LiteLLM?

LiteLLM is a Python library with a proxy server option. RelayPlane is npm-native for Node.js developers, no Python, no Docker, just npm install. RelayPlane has per-request cost tracking built in and integrates directly with Claude Code, Cursor, and OpenClaw via a simple base URL override.

How does RelayPlane compare to Portkey or Helicone?

Portkey and Helicone offer cloud observability and are npm-compatible. RelayPlane runs entirely locally (no cloud required), is open source under MIT, and focuses on cost tracking plus complexity-based routing for Node.js developers using Claude Code, Cursor, and OpenClaw.

Does RelayPlane work with Claude Code and Cursor?

Yes. RelayPlane is designed for Claude Code, Cursor, and any tool that supports ANTHROPIC_BASE_URL or OPENAI_BASE_URL. Set ANTHROPIC_BASE_URL=http://localhost:4100 and RelayPlane intercepts every request, tracks costs, and routes to cheaper models for simple tasks.

Yes, completely. RelayPlane is free and open source (MIT). Every feature is included: full proxy, local and cloud dashboard, all 11 providers, budget enforcement, anomaly detection, hard cost caps, auto-kill, and 90-day history. It runs locally on your machine, your keys and prompts never leave your box. No tiers, no paywalls, no credit card. See relayplane.com/pricing.

How do I install the RelayPlane npm LLM proxy?

Run: npm install -g @relayplane/proxy. Then: relayplane init && relayplane start. The local dashboard runs at http://localhost:4100. Set ANTHROPIC_BASE_URL=http://localhost:4100 in your environment and all Claude requests route through RelayPlane automatically.

RelayPlane vs vLLM

vLLM is a Python-based inference server for hosting open-source models on GPU hardware. RelayPlane is an MIT-licensed npm package that proxies cloud LLM APIs (OpenAI, Anthropic, etc.) with cost tracking, multi-provider routing, and spend controls. They solve different problems for different stacks.

TL;DR

Choose RelayPlane when you want:

npm install and running in 30 seconds, no GPU required
Cost tracking and dashboards for cloud API spend (OpenAI, Anthropic)
Multi-provider routing with fallback and budget controls
Node.js native: no Python runtime in your stack
Claude Code and Cursor integration via OPENAI_BASE_URL

vLLM may work for you if you need:

Self-hosted inference on your own GPU hardware
Open-source models (Llama, Mistral, Qwen) without cloud API costs
High-throughput batch inference with PagedAttention
Full data sovereignty: prompts and completions never leave your hardware

Feature Comparison

Feature	RelayPlane	vLLM
Product type RelayPlane and vLLM solve fundamentally different problems. RelayPlane is a transparent proxy that sits between your application and cloud LLM providers (OpenAI, Anthropic, etc.), adding cost tracking and routing intelligence. vLLM is a high-throughput inference server for hosting open-source models on your own GPU hardware.	npm-native LLM proxy for cloud APIs (local-first)	Self-hosted inference server for running models on GPU hardware
Install method RelayPlane installs as a single npm package in seconds with no hardware requirements. vLLM requires Python, pip, CUDA-compatible GPU hardware, CUDA drivers, and several gigabytes of model weights before it can serve a single request.	npm install -g @relayplane/proxy	pip install vllm (Python 3.8+, CUDA GPU required)
Setup time RelayPlane is running in under 30 seconds: install, start, set OPENAI_BASE_URL, done. vLLM setup involves provisioning GPU hardware (or a cloud GPU instance), installing CUDA drivers, installing Python dependencies, downloading multi-gigabyte model weights, and configuring the inference server.	Under 30 seconds	Hours: GPU setup, CUDA drivers, model download, config
Hardware requirement RelayPlane runs on any developer laptop, CI runner, or server with Node.js installed. No GPU is needed. vLLM requires CUDA-compatible NVIDIA GPU hardware. Running vLLM without a GPU is not supported for production use.	None (runs on any machine with Node.js)	NVIDIA GPU with CUDA support required
Cloud API support (OpenAI, Anthropic, etc.) RelayPlane proxies requests to cloud LLM providers: OpenAI, Anthropic, Mistral, and others. All request/response data flows through RelayPlane for logging and routing. vLLM does not proxy cloud APIs; it serves models you download and host yourself.
Local model serving (open-source models) vLLM excels at serving open-source models like Llama, Mistral, and Qwen on your own GPU hardware with PagedAttention for high throughput. RelayPlane proxies cloud provider APIs and does not run model inference locally.
Cost tracking dashboard RelayPlane logs every request's exact dollar cost from cloud providers in a local SQLite database and provides a cost dashboard. vLLM has no concept of cloud API cost tracking because it runs its own models rather than calling external paid APIs.
Language support RelayPlane is a transparent HTTP proxy: any language or tool that supports the OPENAI_BASE_URL environment variable works automatically, including Node.js, Python, Go, Ruby, and AI coding assistants. vLLM is a Python package with a Python-first API for server setup and client interaction.	Any language via OPENAI_BASE_URL (transparent proxy)	Python-only for server configuration and client integration
npm package RelayPlane publishes @relayplane/proxy to npm for one-command global installation. vLLM is a Python package distributed via pip. There is no npm package for vLLM.
Multi-provider routing RelayPlane routes requests across multiple cloud providers (OpenAI, Anthropic, Mistral, etc.) with configurable fallback and cost-based routing. vLLM serves a single model or a set of locally hosted models and does not route to external cloud providers.
Spend controls and budget limits RelayPlane provides spend controls: set per-project or per-team budget limits that block requests once a threshold is reached. vLLM has no spend controls because there is no per-request cloud API cost when running self-hosted inference.
Works with Claude Code and Cursor RelayPlane integrates with Claude Code, Cursor, Windsurf, and Aider via a single OPENAI_BASE_URL environment variable. vLLM cannot proxy requests from these tools to cloud APIs; it only serves locally hosted open-source models.		Not compatible (serves local models, not cloud API proxy)
No account required Both RelayPlane and vLLM are open source and work without creating an account. RelayPlane requires your existing cloud API keys (OpenAI, Anthropic, etc.) to forward requests. vLLM requires GPU hardware and model weights instead.
Pricing model Both tools are free open-source software. RelayPlane has no software cost; you pay only for the cloud API tokens you use. vLLM has no software cost either, but the total cost of ownership includes GPU hardware (on-premise) or GPU cloud instance rental, electricity, and model storage.	MIT open source, free to self-host	Apache 2.0 open source, free software (GPU hardware cost not included)

Understanding the Difference: Cloud Proxy vs Local Inference

RelayPlane and vLLM solve different problems

vLLM is an inference server: it downloads open-source models and runs them on your GPU. RelayPlane is a cloud API proxy: it forwards your requests to OpenAI, Anthropic, and other providers while tracking cost and routing intelligently. These tools are not direct competitors. If you use cloud LLM APIs (most Node.js developers do), vLLM is not relevant to your stack. If you self-host open-source models on GPU hardware, RelayPlane cannot serve inference for you.

npm install vs hours of GPU infrastructure setup

npm install -g @relayplane/proxy and you are proxying cloud LLM requests in under 30 seconds. vLLM requires: a CUDA-compatible NVIDIA GPU (or a cloud GPU instance at $1+ per hour), CUDA drivers installed correctly, Python 3.8+ and pip, pip install vllm (which downloads several GB of dependencies), model weight download (7B models are 13+ GB, 70B models are 130+ GB), and server configuration. For a developer who wants to track what OpenAI calls cost, that is the wrong tool entirely.

Node.js native, no Python required

RelayPlane is built for Node.js developers and AI agent builders. It installs via npm, runs as a Node.js process, and integrates with any JavaScript or TypeScript project without adding Python to your stack. vLLM is Python-only: the server is configured in Python, the client integrations are Python-first, and the entire ecosystem assumes a Python runtime. For Node.js teams, adding vLLM means adding and maintaining a Python environment alongside your JavaScript codebase.

Cloud API cost tracking that vLLM cannot provide

When you call OpenAI or Anthropic, each request has a real dollar cost based on token usage. RelayPlane captures every request, calculates the exact cost using current provider pricing, and stores it in a local SQLite database with a cost dashboard. vLLM has no equivalent: it runs models you downloaded, so there is no per-request cloud API cost to track. If you use both vLLM for some workloads and cloud APIs for others, RelayPlane handles the cloud cost tracking side of that equation.

vLLM Solves GPU Inference Problems. RelayPlane Solves Cloud API Cost Problems.

vLLM is a genuinely impressive piece of engineering. Its PagedAttention algorithm and continuous batching make it one of the fastest open-source inference servers available, and for teams with dedicated GPU infrastructure and a need to serve open-source models at scale, it is the right choice. The vLLM project has over 40,000 GitHub stars for good reason.

But if you are a Node.js developer calling OpenAI or Anthropic APIs, vLLM does not belong in your stack. RelayPlane does. It installs in one npm command, requires no GPU, no Python, and no hardware investment. It proxies your existing cloud API calls and gives you per-request cost tracking, multi-provider routing, spend controls, and a local dashboard in under 30 seconds. No agent. No account. No cloud dependency.

Get Running in 30 Seconds

No GPU. No Python. No account:

# Install globally via npm

npm install -g @relayplane/proxy

# Start the proxy

relayplane init

relayplane start

# Point your app or Claude Code at localhost - no code changes needed

// OPENAI_BASE_URL=http://localhost:4100

Start controlling LLM costs in one command

No GPU. No Python. No account. No monthly fee. MIT open source. Runs on localhost with Claude Code and Cursor in under 30 seconds.

npm install -g @relayplane/proxy