155K+ GitHub stars. The AI coding agent that's changing how developers work. One session can make hundreds of API calls — and token costs spiral fast.
A typical debugging session? 50+ Claude calls. A refactoring task? 100+ calls. Using Opus for everything means $3-10+ per task in API fees.
Route between any model seamlessly
POST /v1/messagesAny SDK respecting BASE_URL redirects through the local proxy
→ claude-3-5-haikuAnalyzes prompt, infers task type, selects optimal model for cost/quality
← SSE streamDirect connection to provider. Response streams back. Outcome logged.
RelayPlane doesn't just route — it learns from every request. Track outcomes, detect patterns, and continuously improve routing for your specific codebase.
relayplane:autoInfers task type from prompt, routes to optimal model. Best balance of cost and quality.
code_review → haiku complex_analysis → sonnet
relayplane:costAlways routes to cheapest models. Maximum savings, upgrades on failure.
everything → haiku # retry with sonnet if needed
relayplane:qualityUses best available model. Maximum quality, similar to no optimization.
everything → opus # no downgrade
RelayPlane analyzes your prompts and infers task types to make optimal routing decisions.
code_generationWriting new code
code_reviewReviewing, debugging
summarizationCondensing text
analysisDeep reasoning
question_answeringDirect questions
data_extractionStructured parsing
creative_writingStories, copy
translationLanguage conversion
Config file: ~/.relayplane/config.json — hot-reloads on save.
{
"enabled": true,
"routing": {
"mode": "cascade",
"cascade": {
"enabled": true,
"models": [
"claude-3-haiku-20240307",
"claude-3-5-sonnet-20241022",
"claude-3-opus-20240229"
],
"escalateOn": "uncertainty",
"maxEscalations": 1
},
"complexity": {
"enabled": true,
"simple": "claude-3-haiku-20240307",
"moderate": "claude-3-5-sonnet-20241022",
"complex": "claude-3-opus-20240229"
}
},
"reliability": {
"cooldowns": {
"enabled": true,
"allowedFails": 3,
"windowSeconds": 60,
"cooldownSeconds": 120
}
},
"modelOverrides": {}
}"cascade" — Try cheap models first, escalate on uncertainty
"standard" — Direct task-based routing
"uncertainty" — Escalate when model seems unsure
"refusal" — Escalate on refusal patterns
"error" — Escalate on API errors
Automatically classifies prompts as simple/moderate/complex and routes to the configured model for each tier.
Auto-disable failing providers. 3 failures in 60s triggers 120s cooldown.
/control/statusProxy status and current configuration
{ "enabled": true, "mode": "cascade", "modelOverrides": {} }/control/statsAggregated statistics and routing counts
{ "totalRequests": 142, "successRate": "97.2%", "avgLatencyMs": 1203, "modelCounts": {...} }/control/enableEnable routing (returns { enabled: true })
{ "enabled": true }/control/disableDisable routing — passthrough mode
{ "enabled": false }/control/configUpdate config (merges with existing, hot-reload)
{ "ok": true, "config": {...} }POST /v1/messagesNative Anthropic API — for Claude Code and direct Claude integrations
POST /v1/chat/completionsOpenAI-compatible API — works with any OpenAI SDK
Start it with verbose mode to see what's happening:
npx @relayplane/proxy --port 3001 -v
Check that your BASE_URL is set correctly:
echo $ANTHROPIC_BASE_URL # Should be http://localhost:3001
Unset the BASE_URL or add the bypass header:
unset ANTHROPIC_BASE_URL # Or use header: X-RelayPlane-Bypass: true
Check the stats endpoint or look at proxy logs:
curl http://localhost:3001/control/stats
Make sure your API key is set:
export ANTHROPIC_API_KEY="sk-ant-..."