Your Claude calls will start failing June 15
unless you have a fallback.
Anthropic capped programmatic Claude usage on Max subscriptions starting June 15. When you hit the cap, requests return 429. RelayPlane routes the overflow to Gemini 2.5 Pro, GPT-4o, or local Ollama, silently, so your agents keep running.
Three steps. Thirty seconds.
No SDK swap, no rewrite. One env var pointed at the proxy, one config block with a fallback key.
1. Install the proxy
npm install -g @relayplane/proxy, then point your agent at http://localhost:4100 with one env var. Your Anthropic SDK calls keep working unchanged.
2. Add a fallback key
Drop an OpenAI, Google, or local Ollama endpoint into the credential pool in relayplane.config.yml. Pick any combo. Most teams pair Claude with Gemini 2.5 Pro for cost or GPT-4o for parity.
3. Cap hits, traffic auto-routes
When Anthropic returns a 429 or the proxy sees your usage approach the cap, RelayPlane fails over to the next credential. Your agent loop never sees the error. No alerts to wire, no retry logic to write.
The config, in full
This is the entire credential-pool block. Primary Claude, with Gemini 2.5 Pro, GPT-4o, and local Ollama as fallbacks in order.
# relayplane.config.yml
providers:
anthropic:
credentials:
- key: ${ANTHROPIC_KEY}
model: claude-sonnet-4-6
priority: 1
quota:
soft_limit: 0.90 # fail over at 90% of the June 15 cap
fallbacks:
- provider: google
key: ${GOOGLE_API_KEY}
model: gemini-2.5-pro
priority: 2
- provider: openai
key: ${OPENAI_API_KEY}
model: gpt-4o
priority: 3
- provider: ollama
endpoint: http://localhost:11434
model: llama3.1:70b
priority: 4 # last resort, runs on your hardwarePriority order is the routing order on cap or 429. Drop the providers you do not have. Free tier covers the credential pool and quota-aware routing.
What the fallback actually does for your agents
- ›Gemini 2.5 Pro takes over on long-context or reasoning tasks. Lower cost per token than Claude, comparable quality on most agent workloads.
- ›GPT-4o handles the parity case when your prompts depend on tool-use or vision and you need a drop-in for Claude Sonnet behavior.
- ›Local Ollama is the air-gap last resort. Llama 3.1 70B or Qwen 2.5 on your own hardware. Free per request, keeps agents alive through total API outages.
- ›Zero code changes. Your agent keeps calling the Anthropic SDK. The proxy rewrites the request to the fallback provider's API shape and returns an Anthropic-shaped response.