Routing
Intelligent model selection based on capabilities, cost, and availability.
Overview
The routing engine selects the best model for each request based on:
- Required capabilities — What the request needs (tool use, vision, etc.)
- Cost preferences — Optimize for cost vs quality
- Provider availability — Real-time health checks
- Fallback configuration — Automatic failover chains
Capabilities
| Capability | Description |
|---|---|
chat | Basic chat completion |
tool_use | Function/tool calling |
vision | Image understanding |
code | Code generation and analysis |
long_context | 100k+ token context window |
streaming | Server-sent events streaming |
structured_output | JSON mode or schema enforcement |
Getting a Routing Decision
1curl -X POST http://localhost:3001/v1/routing/route \2 -H "Content-Type: application/json" \3 -d '{4 "workspace_id": "ws_123",5 "required_capabilities": ["chat", "tool_use"],6 "prefer_cost": "low",7 "allow_fallback": true8 }'910# Response:11{12 "success": true,13 "selected_provider": "anthropic",14 "selected_model": "claude-3-5-sonnet",15 "rationale": "Selected for tool_use capability with lowest cost among capable models",16 "candidates": [17 {18 "provider": "anthropic",19 "model": "claude-3-5-sonnet",20 "score": 0.92,21 "capabilities_match": ["chat", "tool_use"],22 "price_tier": "medium"23 },24 {25 "provider": "openai",26 "model": "gpt-4o",27 "score": 0.88,28 "capabilities_match": ["chat", "tool_use"],29 "price_tier": "high"30 }31 ],32 "fallback_chain": ["openai:gpt-4o", "google:gemini-pro"]33}Scoring Algorithm
Each model is scored based on weighted factors:
1// Default scoring weights2{3 "capability_match": 0.40, // Must have required capabilities4 "cost_efficiency": 0.25, // Lower cost = higher score5 "latency": 0.15, // Faster = higher score 6 "availability": 0.10, // Current health status7 "quality": 0.10 // Historical success rate8}910// Score calculation11score = 12 (capability_score * 0.40) +13 (cost_score * 0.25) +14 (latency_score * 0.15) +15 (availability_score * 0.10) +16 (quality_score * 0.10)Weights can be adjusted per-workspace via the routing configuration API.
Fallback Chains
When a provider fails, the routing engine automatically tries the next model in the chain:
1// Configure a fallback chain2{3 "id": "production-chain",4 "name": "Production Fallback",5 "models": [6 "anthropic:claude-3-5-sonnet", // Primary7 "openai:gpt-4o", // First fallback8 "openai:gpt-4o-mini", // Cheaper fallback9 "google:gemini-pro" // Last resort10 ],11 "failure_triggers": [12 "rate_limit",13 "timeout", 14 "provider_error",15 "capacity_exceeded"16 ]17}Built-in Model Registry
RelayPlane includes a registry of major models with their capabilities:
1curl http://localhost:3001/v1/routing/models23{4 "models": [5 {6 "id": "claude-3-5-sonnet",7 "provider": "anthropic",8 "capabilities": ["chat", "tool_use", "vision", "code", "long_context", "streaming"],9 "pricing": { "input_per_1k": 0.003, "output_per_1k": 0.015 },10 "context_window": 200000,11 "status": "available"12 },13 {14 "id": "gpt-4o",15 "provider": "openai",16 "capabilities": ["chat", "tool_use", "vision", "code", "streaming", "structured_output"],17 "pricing": { "input_per_1k": 0.005, "output_per_1k": 0.015 },18 "context_window": 128000,19 "status": "available"20 },21 {22 "id": "gpt-4o-mini",23 "provider": "openai",24 "capabilities": ["chat", "tool_use", "streaming", "structured_output"],25 "pricing": { "input_per_1k": 0.00015, "output_per_1k": 0.0006 },26 "context_window": 128000,27 "status": "available"28 }29 // ... more models30 ]31}Provider Lanes
Models are categorized into lanes based on their provider type:
- proprietary — OpenAI, Anthropic, Google (commercial APIs)
- hosted — Together, Groq, Fireworks (hosted open models)
- local — Ollama, LM Studio (self-hosted)
- custom — Your own endpoints (vLLM, TGI, etc.)
Provider Health
The routing engine tracks provider health in real-time:
1// Health statuses2{3 "anthropic": {4 "status": "healthy",5 "latency_p50_ms": 450,6 "success_rate": 0.998,7 "last_check": "2026-02-06T12:00:00Z"8 },9 "openai": {10 "status": "degraded",11 "latency_p50_ms": 1200,12 "success_rate": 0.95,13 "last_check": "2026-02-06T12:00:00Z",14 "issue": "Elevated latency detected"15 }16}