Smart Routing
How RelayPlane classifies tasks, scores complexity, and selects the optimal model for each request.
Routing Modes
The proxy supports four routing modes, controlled by routing.mode in config:
- auto — Always classify and route by complexity, even when the client sends a specific model name. Recommended for maximum savings.
- cascade — Start with a cheap model, escalate if the response shows uncertainty or refusal.
- standard — Use learned routing rules from the core engine.
- passthrough — No routing; forward requests as-is to the specified provider.
Complexity Classification
The classifier analyzes only the last user message (not system prompts) and assigns a numeric score:
| Signal | Score | Patterns |
|---|---|---|
| Code indicators | +2 | Code blocks, function/class/const/let/import |
| Analytical tasks | +2 | analyze, compare, evaluate, assess, review, audit |
| Math/logic | +2 | calculate, compute, solve, equation, prove, derive |
| Multi-step | +2 | first...then, step 1/2, phase N |
| Architecture | +3 | architect, infrastructure, distributed, microservice, system design |
| Creative writing | +2 | write a story/essay/article, create a, design a |
| Implementation | +2 | implement, refactor, debug, optimize, migrate |
| Planning | +1 | strategy, roadmap, plan for |
| Long content | +1 to +4 | >500 tokens: +1, >2000: +2, >5000: +2 |
| Multiple requirements | +1 to +2 | ≥3 "and": +1, ≥5: +1 more |
Score mapping: <2 = simple, 2-3 = moderate, ≥4 = complex
Task Type Inference
Separately from complexity, the core engine infers one of 9 task types from the prompt text. This is used for telemetry categorization and learned routing rules.
Cascade Escalation
In cascade mode, the proxy detects when a cheaper model's response is insufficient:
- Uncertainty — phrases like "I'm not sure", "it's hard to say", "I can't definitively"
- Refusal — phrases like "I can't help with that", "as an AI"
- Error — provider returns an error
When escalation triggers, the request is re-sent to the next model in the cascade chain.
Model Resolution
The proxy resolves models in this order:
- Check
modelOverridesin config - Check RelayPlane aliases (
relayplane:auto→rp:balanced) - Check smart aliases (
rp:best,rp:fast, etc.) - Check model mapping (
sonnet→claude-sonnet-4-6) - Auto-detect by prefix (
claude-*→ Anthropic,gpt-*→ OpenAI, etc.) - Try
provider/modelformat (anthropic/claude-3-5-sonnet-latest)
Response Headers
Every proxied response includes routing metadata headers:
x-relayplane-routed-model— Actual model usedx-relayplane-requested-model— Original model requestedx-relayplane-complexity— Inferred complexity levelx-relayplane-provider— Provider usedx-relayplane-routing-mode— Which routing mode was applied
Suffix Routing
Append a routing suffix to any model name to override the routing strategy for that request:
claude-sonnet-4-6:cost— Force cost optimizationgpt-4o:quality— Force highest qualityclaude-3-5-haiku:fast— Force fastest model
X-RelayPlane-Model header to override the model for a single request without changing the request body. Use X-RelayPlane-Bypass: true to skip routing entirely.