Semantic Routing

An optional local sidecar that classifies each prompt with a fine-tuned language model, routing to the right model with higher accuracy than regex patterns.

What It Is

By default, RelayPlane uses regex heuristics to classify prompts (chat, completion, code). The semantic router replaces that with a local ModernBERT-base model (Apache 2.0) that classifies task type and recommends the best model from your available pool. The proxy falls back to regex automatically if the sidecar is unreachable, times out, or returns low-confidence results.

The sidecar runs entirely on your machine. No prompts leave your network.

Requirements

  • Node 18+ for the proxy (built-in fetch and AbortController)
  • A running sidecar that exposes POST /v1/route and GET /health

Start the Sidecar

The reference sidecar is built on ModernBERT-base. Start it locally:

Docker

1docker run -p 8888:8888 relayplane/semantic-router-sidecar:latest

pip

1pip install relayplane-sidecar
2relayplane-sidecar --port 8888
The sidecar runs on CPU by default. A CUDA-capable GPU reduces latency from ~80ms to ~10ms, but is not required.

Configure the Proxy

Set these environment variables before starting RelayPlane:

VariableDefaultDescription
RELAYPLANE_SIDECAR_URL(unset)Base URL of the sidecar, e.g. http://localhost:8888. If unset, the sidecar is disabled and regex classification is used.
RELAYPLANE_SIDECAR_CONFIDENCE_THRESHOLD0.65Minimum confidence (0-1) to accept a sidecar result. Results below this threshold fall back to regex.
RELAYPLANE_SIDECAR_TIMEOUT_MS200Request timeout in milliseconds. Clamped to [50, 2000].
1export RELAYPLANE_SIDECAR_URL=http://localhost:8888
2export RELAYPLANE_SIDECAR_CONFIDENCE_THRESHOLD=0.70
3export RELAYPLANE_SIDECAR_TIMEOUT_MS=150
4relayplane start

Fallback Behavior

The proxy silently falls back to regex classification when:

  • RELAYPLANE_SIDECAR_URL is not set
  • The sidecar is unreachable at startup or request time
  • The HTTP request times out
  • The response is malformed or missing required fields
  • The returned confidence is below RELAYPLANE_SIDECAR_CONFIDENCE_THRESHOLD

No configuration is required to enable fallback. It is always active.

Observability

Each captured knowledge atom in ~/.relayplane/osmosis.db includes:

  • classifierSource: 'regex' or 'sidecar'
  • classifierConfidence: confidence score when the sidecar was used
  • classifierRecommendedModel: model recommended by the sidecar

Compare routing quality over time by querying classifier_source in the database.

Zero-config fallback. If you skip this setup entirely, the proxy works exactly as before using regex classification. The sidecar only activates when RELAYPLANE_SIDECAR_URL is set.