Model Selection

How RelayPlane automatically selects the right model for each task to minimize costs without sacrificing quality.

Overview

RelayPlane analyzes each incoming request and classifies it by complexity. Simple tasks get routed to cheaper, faster models while complex tasks use premium models. This happens transparently — your tool doesn't need to change anything.

Task Classification

The proxy infers task type from the request content using pattern matching on the last user message. It recognizes 9 task types:

Task TypeDescription
code_generationWriting new code (implement, refactor, build)
code_reviewReviewing or auditing code
summarizationCondensing long content
analysisAnalytical and evaluative tasks
creative_writingStories, essays, articles
data_extractionPulling structured data from text
translationLanguage translation
question_answeringDirect Q&A
generalEverything else

Complexity Scoring

Each request is scored for complexity based on the last user message (not system prompts or conversation history). The scoring considers:

  • Code indicators — code blocks, function/class definitions (+2)
  • Analytical tasks — analyze, compare, evaluate, review (+2)
  • Math/logic — calculate, solve, prove, derive (+2)
  • Multi-step reasoning — step-by-step instructions (+2)
  • Architecture/design — system design, distributed systems (+3)
  • Implementation requests — implement, refactor, debug, optimize (+2)
  • Token length — longer messages score higher (+1 to +4)
  • Multiple requirements — many "and" conjunctions (+1 to +2)

The score maps to three complexity levels:

  • Simple (score < 2) → cheapest model
  • Moderate (score 2-3) → mid-tier model
  • Complex (score ≥ 4) → premium model

Default Model Mapping

The default complexity-to-model mapping (configurable in ~/.relayplane/config.json):

1{
2 "routing": {
3 "complexity": {
4 "enabled": true,
5 "simple": "claude-sonnet-4-6",
6 "moderate": "claude-sonnet-4-6",
7 "complex": "claude-opus-4-6"
8 }
9 }
10}

Routing Modes

You can request a specific routing strategy using model aliases:

AliasBehavior
rp:bestAlways use highest quality model
rp:fastUse fastest/cheapest model
rp:cheapUse cheapest model available
rp:balancedComplexity-based routing (default auto)
relayplane:autoSame as rp:balanced

Cascade Mode

When routing.mode is set to "cascade", the proxy starts with a cheaper model and escalates to a more expensive one if the response shows signs of uncertainty or refusal:

1{
2 "routing": {
3 "mode": "cascade",
4 "cascade": {
5 "enabled": true,
6 "models": ["claude-sonnet-4-6", "claude-opus-4-6"],
7 "escalateOn": "uncertainty",
8 "maxEscalations": 1
9 }
10 }
11}
Auto mode: Set routing.mode to "auto" to always route based on complexity, even when the client sends a specific model name. This is the recommended setting for maximum savings.