Smart Routing

How RelayPlane classifies tasks, scores complexity, and selects the optimal model for each request.

Routing Modes

The proxy supports four routing modes, controlled by routing.mode in config:

  • auto — Always classify and route by complexity, even when the client sends a specific model name. Recommended for maximum savings.
  • cascade — Start with a cheap model, escalate if the response shows uncertainty or refusal.
  • standard — Use learned routing rules from the core engine.
  • passthrough — No routing; forward requests as-is to the specified provider.

Complexity Classification

The classifier analyzes only the last user message (not system prompts) and assigns a numeric score:

SignalScorePatterns
Code indicators+2Code blocks, function/class/const/let/import
Analytical tasks+2analyze, compare, evaluate, assess, review, audit
Math/logic+2calculate, compute, solve, equation, prove, derive
Multi-step+2first...then, step 1/2, phase N
Architecture+3architect, infrastructure, distributed, microservice, system design
Creative writing+2write a story/essay/article, create a, design a
Implementation+2implement, refactor, debug, optimize, migrate
Planning+1strategy, roadmap, plan for
Long content+1 to +4>500 tokens: +1, >2000: +2, >5000: +2
Multiple requirements+1 to +2≥3 "and": +1, ≥5: +1 more

Score mapping: <2 = simple, 2-3 = moderate, ≥4 = complex

Task Type Inference

Separately from complexity, the core engine infers one of 9 task types from the prompt text. This is used for telemetry categorization and learned routing rules.

Cascade Escalation

In cascade mode, the proxy detects when a cheaper model's response is insufficient:

  • Uncertainty — phrases like "I'm not sure", "it's hard to say", "I can't definitively"
  • Refusal — phrases like "I can't help with that", "as an AI"
  • Error — provider returns an error

When escalation triggers, the request is re-sent to the next model in the cascade chain.

Cascade mode only works for non-streaming requests. Streaming requests automatically fall back to complexity-based routing.

Model Resolution

The proxy resolves models in this order:

  1. Check modelOverrides in config
  2. Check RelayPlane aliases (relayplane:autorp:balanced)
  3. Check smart aliases (rp:best, rp:fast, etc.)
  4. Check model mapping (sonnetclaude-sonnet-4-6)
  5. Auto-detect by prefix (claude-* → Anthropic, gpt-* → OpenAI, etc.)
  6. Try provider/model format (anthropic/claude-3-5-sonnet-latest)

Response Headers

Every proxied response includes routing metadata headers:

  • x-relayplane-routed-model — Actual model used
  • x-relayplane-requested-model — Original model requested
  • x-relayplane-complexity — Inferred complexity level
  • x-relayplane-provider — Provider used
  • x-relayplane-routing-mode — Which routing mode was applied

Suffix Routing

Append a routing suffix to any model name to override the routing strategy for that request:

  • claude-sonnet-4-6:cost — Force cost optimization
  • gpt-4o:quality — Force highest quality
  • claude-3-5-haiku:fast — Force fastest model
Header override: Use the X-RelayPlane-Model header to override the model for a single request without changing the request body. Use X-RelayPlane-Bypass: true to skip routing entirely.