Explainability

Human-readable explanations for every decision in every run.

Overview

Every request through RelayPlane generates a decision chain that can be reconstructed into a human-readable explanation. This is essential for:

  • Debugging — Why did a request fail or behave unexpectedly?
  • Compliance — Audit trail of all decisions
  • Learning — Understanding cost and performance patterns
  • Trust — AI agents explain their decisions to humans

Getting an Explanation

1curl http://localhost:3001/v1/runs/run_xyz789/explain
2
3{
4 "run_id": "run_xyz789",
5 "narrative": "Request allowed. Used claude-3-5-sonnet via anthropic (primary choice). Cost: $0.015. Latency: 2.5s.",
6 "timeline": [
7 {
8 "stage": "auth",
9 "outcome": "passed",
10 "timestamp": "2026-02-06T12:00:00.005Z",
11 "detail": "API auth verified. Agent: support-bot, Workspace: ws_production"
12 },
13 {
14 "stage": "policy",
15 "outcome": "passed",
16 "timestamp": "2026-02-06T12:00:00.010Z",
17 "detail": "3 policies evaluated. Daily budget: $15.00/$50.00 used (30%). All passed."
18 },
19 {
20 "stage": "routing",
21 "outcome": "selected",
22 "timestamp": "2026-02-06T12:00:00.015Z",
23 "detail": "claude-3-5-sonnet selected. Matched capabilities: [chat, tool_use]. Score: 0.92. Fallback available: gpt-4o"
24 },
25 {
26 "stage": "provider",
27 "outcome": "success",
28 "timestamp": "2026-02-06T12:00:02.500Z",
29 "detail": "anthropic responded in 2.5s. 150 tokens. $0.015."
30 }
31 ],
32 "insights": [
33 "This run used 30% of daily budget",
34 "Primary model succeeded on first try",
35 "Latency within normal range for claude-3-5-sonnet"
36 ]
37}

Decision Timeline

Each run has a timeline of decision stages:

StageOutcomesDescription
authpassed, failedAuthentication and authorization check
policypassed, failed, warnedPolicy evaluation (budgets, allowlists, etc.)
routingselected, fallbackModel and provider selection
providersuccess, error, timeoutProvider request and response

Explaining Failed Runs

1curl http://localhost:3001/v1/runs/run_failed123/explain
2
3{
4 "run_id": "run_failed123",
5 "narrative": "Request blocked by policy. Daily budget exceeded ($52.00/$50.00).",
6 "timeline": [
7 {
8 "stage": "auth",
9 "outcome": "passed",
10 "detail": "API auth verified"
11 },
12 {
13 "stage": "policy",
14 "outcome": "failed",
15 "detail": "Blocked by 'Daily Budget Cap'. Current spend: $52.00, Limit: $50.00"
16 }
17 ],
18 "blocking_decision": {
19 "type": "policy",
20 "policy_name": "Daily Budget Cap",
21 "policy_type": "budget.per_day",
22 "reason": "Budget exhausted"
23 },
24 "suggestions": [
25 "Increase daily budget limit in workspace settings",
26 "Wait until tomorrow for budget reset",
27 "Use a lower-cost model for this request"
28 ]
29}

Comparing Runs

Compare two runs to understand differences:

1curl -X POST http://localhost:3001/v1/runs/compare \
2 -H "Content-Type: application/json" \
3 -d '{"run_ids": ["run_123", "run_456"]}'
4
5{
6 "runs": ["run_123", "run_456"],
7 "differences": [
8 {
9 "field": "model",
10 "run_123": "claude-3-5-sonnet",
11 "run_456": "gpt-4o",
12 "type": "value_changed"
13 },
14 {
15 "field": "latency_ms",
16 "run_123": 2500,
17 "run_456": 4200,
18 "type": "value_changed",
19 "delta": "+68%"
20 },
21 {
22 "field": "cost_usd",
23 "run_123": 0.015,
24 "run_456": 0.025,
25 "type": "value_changed",
26 "delta": "+67%"
27 }
28 ],
29 "summary": "run_456 used gpt-4o instead of claude-3-5-sonnet, resulting in 68% higher latency and 67% higher cost"
30}

Simulation

Test what would happen without making a real request:

1curl -X POST http://localhost:3001/v1/simulate \
2 -H "Content-Type: application/json" \
3 -d '{
4 "workspace_id": "ws_123",
5 "agent_id": "agent_456",
6 "model": "claude-3-opus",
7 "estimated_tokens": 50000
8 }'
9
10{
11 "would_succeed": false,
12 "simulated_decisions": [
13 {
14 "stage": "policy",
15 "outcome": "would_fail",
16 "detail": "model.allowlist: claude-3-opus not in approved models"
17 }
18 ],
19 "estimated_cost": 2.50,
20 "recommendations": [
21 "Use claude-3-5-sonnet instead (approved and 85% cheaper)",
22 "Request model approval from workspace admin"
23 ]
24}
Simulation is perfect for pre-flight checks before expensive operations.

Debugging Tips

  • Check the timeline — Most issues are visible in the decision timeline
  • Compare with working runs — Use run comparison to spot differences
  • Check policy order — Policies evaluate in priority order
  • Verify provider health — Check if provider was degraded when run failed

Next Steps