Routing

Intelligent model selection based on capabilities, cost, and availability.

Overview

The routing engine selects the best model for each request based on:

  • Required capabilities — What the request needs (tool use, vision, etc.)
  • Cost preferences — Optimize for cost vs quality
  • Provider availability — Real-time health checks
  • Fallback configuration — Automatic failover chains

Capabilities

CapabilityDescription
chatBasic chat completion
tool_useFunction/tool calling
visionImage understanding
codeCode generation and analysis
long_context100k+ token context window
streamingServer-sent events streaming
structured_outputJSON mode or schema enforcement

Getting a Routing Decision

1curl -X POST http://localhost:3001/v1/routing/route \
2 -H "Content-Type: application/json" \
3 -d '{
4 "workspace_id": "ws_123",
5 "required_capabilities": ["chat", "tool_use"],
6 "prefer_cost": "low",
7 "allow_fallback": true
8 }'
9
10# Response:
11{
12 "success": true,
13 "selected_provider": "anthropic",
14 "selected_model": "claude-3-5-sonnet",
15 "rationale": "Selected for tool_use capability with lowest cost among capable models",
16 "candidates": [
17 {
18 "provider": "anthropic",
19 "model": "claude-3-5-sonnet",
20 "score": 0.92,
21 "capabilities_match": ["chat", "tool_use"],
22 "price_tier": "medium"
23 },
24 {
25 "provider": "openai",
26 "model": "gpt-4o",
27 "score": 0.88,
28 "capabilities_match": ["chat", "tool_use"],
29 "price_tier": "high"
30 }
31 ],
32 "fallback_chain": ["openai:gpt-4o", "google:gemini-pro"]
33}

Scoring Algorithm

Each model is scored based on weighted factors:

1// Default scoring weights
2{
3 "capability_match": 0.40, // Must have required capabilities
4 "cost_efficiency": 0.25, // Lower cost = higher score
5 "latency": 0.15, // Faster = higher score
6 "availability": 0.10, // Current health status
7 "quality": 0.10 // Historical success rate
8}
9
10// Score calculation
11score =
12 (capability_score * 0.40) +
13 (cost_score * 0.25) +
14 (latency_score * 0.15) +
15 (availability_score * 0.10) +
16 (quality_score * 0.10)
Weights can be adjusted per-workspace via the routing configuration API.

Fallback Chains

When a provider fails, the routing engine automatically tries the next model in the chain:

1// Configure a fallback chain
2{
3 "id": "production-chain",
4 "name": "Production Fallback",
5 "models": [
6 "anthropic:claude-3-5-sonnet", // Primary
7 "openai:gpt-4o", // First fallback
8 "openai:gpt-4o-mini", // Cheaper fallback
9 "google:gemini-pro" // Last resort
10 ],
11 "failure_triggers": [
12 "rate_limit",
13 "timeout",
14 "provider_error",
15 "capacity_exceeded"
16 ]
17}

Built-in Model Registry

RelayPlane includes a registry of major models with their capabilities:

1curl http://localhost:3001/v1/routing/models
2
3{
4 "models": [
5 {
6 "id": "claude-3-5-sonnet",
7 "provider": "anthropic",
8 "capabilities": ["chat", "tool_use", "vision", "code", "long_context", "streaming"],
9 "pricing": { "input_per_1k": 0.003, "output_per_1k": 0.015 },
10 "context_window": 200000,
11 "status": "available"
12 },
13 {
14 "id": "gpt-4o",
15 "provider": "openai",
16 "capabilities": ["chat", "tool_use", "vision", "code", "streaming", "structured_output"],
17 "pricing": { "input_per_1k": 0.005, "output_per_1k": 0.015 },
18 "context_window": 128000,
19 "status": "available"
20 },
21 {
22 "id": "gpt-4o-mini",
23 "provider": "openai",
24 "capabilities": ["chat", "tool_use", "streaming", "structured_output"],
25 "pricing": { "input_per_1k": 0.00015, "output_per_1k": 0.0006 },
26 "context_window": 128000,
27 "status": "available"
28 }
29 // ... more models
30 ]
31}

Provider Lanes

Models are categorized into lanes based on their provider type:

  • proprietary — OpenAI, Anthropic, Google (commercial APIs)
  • hosted — Together, Groq, Fireworks (hosted open models)
  • local — Ollama, LM Studio (self-hosted)
  • custom — Your own endpoints (vLLM, TGI, etc.)

Provider Health

The routing engine tracks provider health in real-time:

1// Health statuses
2{
3 "anthropic": {
4 "status": "healthy",
5 "latency_p50_ms": 450,
6 "success_rate": 0.998,
7 "last_check": "2026-02-06T12:00:00Z"
8 },
9 "openai": {
10 "status": "degraded",
11 "latency_p50_ms": 1200,
12 "success_rate": 0.95,
13 "last_check": "2026-02-06T12:00:00Z",
14 "issue": "Elevated latency detected"
15 }
16}

Next Steps