Model Fallback

Automatic failover to backup models when your primary model fails.

Overview

Model fallback ensures your workflows keep running even when a provider is down or rate-limited. Chain multiple models together and RelayPlane will automatically try the next one if the previous fails.

Basic Usage

Use the .fallback() method to chain backup models:

1import { relay } from "@relayplane/sdk";
2
3const result = await relay
4 .workflow("resilient-analysis")
5 .step("analyze")
6 .with("openai:gpt-4o")
7 .fallback("anthropic:claude-sonnet-4-20250514")
8 .fallback("openai:gpt-4o-mini")
9 .prompt("Analyze this document: {{input.text}}")
10 .run({ text: documentContent });

How It Works

The fallback chain operates with retry-then-fallback semantics:

  1. Try the primary model with configured retries (exponential backoff)
  2. If all retries fail, move to the first fallback model
  3. Try that model with its own retry attempts
  4. Continue through the chain until success or all models exhausted
1Primary: openai:gpt-4o
2 ├── Attempt 1 → Rate limit (429)
3 ├── Attempt 2 → Rate limit (429)
4 └── Attempt 3 → Still failing, move to fallback
5
6Fallback 1: anthropic:claude-sonnet-4-20250514
7 └── Attempt 1 → Success! ✓
8
9Result: Success (using fallback model)

Combining with Retry

Retry settings apply per-model in the chain:

1relay
2 .workflow("robust-extraction")
3 .step("extract", {
4 retry: { maxRetries: 2, backoffMs: 1000 } // Each model gets 2 retries
5 })
6 .with("openai:gpt-4o")
7 .fallback("anthropic:claude-sonnet-4-20250514")
8 .fallback("openai:gpt-4o-mini")
9 .prompt("Extract data from: {{input.document}}")
Each model in the fallback chain gets its own retry attempts. With maxRetries: 2 and 3 models, you could have up to 9 total attempts before final failure.

What Triggers Fallback

Fallback is triggered when a model fails after exhausting its retries:

Error TypeBehavior
Rate limit (429)Retry with backoff, then fallback
Server error (5xx)Retry with backoff, then fallback
TimeoutImmediate fallback (no retry)
Provider outageRetry with backoff, then fallback
Auth error (401/403)Immediate fallback (non-recoverable)

Cross-Provider Fallback

Fallback models can be from any configured provider:

1relay.configure({
2 providers: {
3 openai: { apiKey: process.env.OPENAI_API_KEY },
4 anthropic: { apiKey: process.env.ANTHROPIC_API_KEY },
5 google: { apiKey: process.env.GOOGLE_API_KEY },
6 }
7});
8
9// Mix providers in your fallback chain
10relay
11 .workflow("multi-provider")
12 .step("generate")
13 .with("openai:gpt-4o") // Primary: OpenAI
14 .fallback("anthropic:claude-sonnet-4-20250514") // Fallback 1: Anthropic
15 .fallback("google:gemini-1.5-pro") // Fallback 2: Google
16 .prompt("Generate a summary")
Cross-provider fallback provides maximum resilience. If one provider has an outage, your workflow automatically routes to another.

Checking Which Model Was Used

The workflow result includes metadata about fallback usage:

1const result = await workflow.run(input);
2
3// Check if a fallback was used
4if (result.metadata.fallbackUsed) {
5 console.log("Primary model failed, used:", result.metadata.model);
6 console.log("Original model was:", result.metadata.originalModel);
7 console.log("Fallback index:", result.metadata.fallbackIndex);
8}

Best Practices

  • Order by capability: Put your most capable model first, then progressively simpler models as fallbacks
  • Mix providers: Use different providers for true redundancy against outages
  • Consider cost: Fallback to cheaper models (e.g., gpt-4o → gpt-4o-mini) for cost efficiency
  • Test your chain: Verify all fallback models can handle your prompts and schemas
  • Monitor usage: Track which models are being used to detect provider issues early

Complete Example

1import { relay } from "@relayplane/sdk";
2import { z } from "zod";
3
4// Configure multiple providers
5relay.configure({
6 providers: {
7 openai: { apiKey: process.env.OPENAI_API_KEY },
8 anthropic: { apiKey: process.env.ANTHROPIC_API_KEY },
9 }
10});
11
12// Define schema for structured extraction
13const SentimentSchema = z.object({
14 sentiment: z.enum(["positive", "negative", "neutral"]),
15 confidence: z.number().min(0).max(1),
16 keywords: z.array(z.string()),
17});
18
19// Build resilient workflow with fallbacks
20const result = await relay
21 .workflow("sentiment-analysis")
22 .step("analyze", {
23 schema: SentimentSchema,
24 retry: { maxRetries: 2, backoffMs: 500 }
25 })
26 .with("openai:gpt-4o")
27 .fallback("anthropic:claude-sonnet-4-20250514")
28 .fallback("openai:gpt-4o-mini")
29 .prompt(`Analyze the sentiment of this text:
30
31Text: {{input.text}}
32
33Return sentiment (positive/negative/neutral), confidence score, and key phrases.`)
34 .run({ text: "I absolutely love this product! Best purchase ever." });
35
36if (result.success) {
37 console.log("Sentiment:", result.finalOutput.sentiment);
38 console.log("Confidence:", result.finalOutput.confidence);
39} else {
40 console.error("All models failed:", result.error?.message);
41}