Benchmarks / Methodology
How Routing Savings Are Calculated
I want to be precise about what the savings numbers mean, how they are calculated, and where the methodology has gaps. If you are evaluating RelayPlane, this page is for you.
The Formula
Routing savings are calculated per request and summed across a session or time window:
// Per-request savings
savings = default_model_cost - actual_routed_cost
// Summed over N requests
total_savings = Σ( default_model_costₙ - actual_routed_costₙ)
- default_modelThe model in your RelayPlane config. This is what every request would cost if routing was disabled. Typically claude-opus-4-6, but it is whatever you set as your default.
- actual_modelThe model RelayPlane selected based on complexity classification. For simple requests, this is a cheaper model. For complex requests, it stays on the default.
- costToken count times provider pricing, calculated separately for input and output tokens. Cache hits reduce actual cost but are tracked in a separate column.
Worked Example
A request classified as simple (short input, low complexity score) gets routed to Sonnet instead of Opus. Here is the math:
Request: 1,200 input tokens, 400 output tokens
Opus (default)
1,200 tokens @ $0.000015 = $0.018
400 tokens @ $0.000075 = $0.03
Total: $0.021
Sonnet (routed)
1,200 tokens @ $0.0000025 = $0.003
400 tokens @ $0.000003 = $0.0012
Total: $0.0042
At 10,000 requests per day, that is $168/day. The savings compound fast when most of your workload is straightforward tasks that do not need the flagship model.
Real Numbers From Our Usage
73.4%
of requests routable to cheaper models
10K+
requests measured
real
cost reduction vs default model
These numbers reflect our usage patterns, with a mix of short one-off questions, medium file edits, and complex multi-step reasoning tasks. Your routing ratio will vary based on what your agents actually do. The dashboard breaks this down per session so you can see your own split.
What "Acceptable Quality" Means
This is the honest part. Right now, routing decisions are based on complexity heuristics: token count, request pattern, estimated difficulty score. The proxy does not verify whether the cheaper model actually produced a good response.
Phase 1 (now)
Heuristic routing based on request complexity. Fast, zero latency overhead, no quality verification.
Phase 2 (planned)
Outcome grading to empirically verify quality across routed calls. This will give us a quality-adjusted savings number instead of a raw cost number.
The routing is probabilistic. Some routed calls may produce different quality output than Opus would have. The aggregate cost reduction is real. Individual call quality varies, and we are building the tooling to measure that properly.
What This Does NOT Measure
- –Whether the cheaper model's output was equally good for your specific use case. That depends on your tasks.
- –Background or suspected automated requests. These are tracked separately and excluded from savings calculations.
- –Cache savings. Prompt cache hits reduce cost but are reported in a separate column from routing savings.
- –Quality degradation. We report cost savings, not quality-adjusted value. Those are different things.
How to Verify
You do not have to take my word for it. RelayPlane logs every request with the model used, token counts, and calculated cost. All of this stays local.
Start the dashboard
Check per-session breakdown
Sessions view shows model mix, token counts, and savings per session. Drill into any session to see individual requests.
All data stays local
The proxy stores everything in a local SQLite database. Nothing leaves your machine unless you opt into cloud telemetry.
Questions about the methodology? Open an issue or reach out directly. I would rather you poke holes in this now than trust numbers you have not validated.