How Routing Savings Are Calculated
Every number RelayPlane shows you has a formula behind it. This page makes that formula explicit so you can decide whether it applies to your workload.
The Formula
Savings are calculated per request and summed across a session or time window:
// Per-request savings
savings = default_model_cost_per_request - actual_routed_cost_per_request
// Total savings over N requests
total_savings = Σ(savings_i) for i in 1..N
- default_modelThe model in your RelayPlane config, typically
claude-opus-4-6. This is what every request would have cost if routing were disabled. - actual_modelWhat RelayPlane actually sent the request to, based on its complexity classification.
- costComputed as
(input_tokens * input_price) + (output_tokens * output_price)using current provider pricing. Input and output are priced separately.
Cache hits reduce the actual cost for a request, but they are tracked as a separate line item. Routing savings and cache savings are never combined into a single figure.
Worked Example
A request comes in with a short prompt and low estimated complexity. RelayPlane classifies it as "simple" and routes it to Claude Sonnet instead of Opus.
// Savings for this one request
savings = $0.02100 - $0.00960 = $0.01140
// Scale to 10,000 similar requests
total_savings = $0.01140 * 10,000 = $114.00
Real Numbers
In testing over 10,000 requests using typical agentic workloads, 73.4% of requests were routable to a cheaper model without changing the task type. That fraction drives the cost reduction figure RelayPlane reports on the benchmarks page.
The 73.4% is not a universal number. It reflects our test suite, which skews toward mixed-complexity agentic sessions. A workload that is entirely long-form synthesis will route less. A workload full of short classification or summarization tasks will route more. Your number will differ.
What "Acceptable Quality" Means
Currently, routing decisions are based on complexity heuristics: token count, request pattern, and an estimated difficulty score derived from prompt structure. If a request looks simple, it goes to a cheaper model.
Phase 2 adds outcome grading, where sampled routed calls are evaluated against the default model output to verify quality empirically. That data will be published here when it is ready.
The routing is probabilistic. Some routed calls will produce output that differs from what Opus would have returned. The aggregate cost reduction is real. Individual call quality varies, and that tradeoff is yours to make.
What This Does NOT Measure
- •Whether the cheaper model's output was equally good for your specific use case. Only you can judge that.
- •Background or suspected automated requests. These are tracked separately and excluded from routing savings.
- •Cache savings. Prompt cache hits reduce cost but are reported as a separate line item, never folded into routing savings.
- •Latency impact. Cheaper models are sometimes faster, sometimes not. That is not tracked here.
How to Verify
You do not have to take these numbers on faith. RelayPlane logs every request with the model used, token counts, and computed cost. Nothing is aggregated before you see it.
- Dashboard
localhost:4100shows per-session and per-request breakdowns, including which model was used and what it cost. - LogsEach proxied request is written to a local SQLite database. You can query it directly to audit any calculation.
- PrivacyAll data stays on your machine. Nothing is sent to RelayPlane servers unless you explicitly enable cloud sync.
See Your Own Numbers
Install RelayPlane and route your next session. The dashboard shows exactly what each request cost and what it would have cost without routing.
Get Started