OpenRouter Fallback Config
Overview
OpenRouter supports native model fallbacks: pass multiple model IDs and OpenRouter tries each in order until one succeeds. You can also use provider.order to control which provider serves a specific model. This skill covers native fallbacks, provider routing, client-side fallback chains, and timeout configuration.
Native Model Fallback (Server-Side)
import os
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=os.environ["OPENROUTER_API_KEY"],
default_headers={"HTTP-Referer": "https://my-app.com", "X-Title": "my-app"},
)
# Pass multiple models -- OpenRouter tries each in order
response = client.chat.completions.create(
model="anthropic/claude-3.5-sonnet", # Primary (used for param validation)
messages=[{"role": "user", "content": "Explain recursion"}],
max_tokens=500,
extra_body={
"models": [
"anthropic/claude-3.5-sonnet",
"openai/gpt-4o",
"google/gemini-2.0-flash-001",
],
"route": "fallback", # Try in order until one succeeds
},
)
# Check which model actually served the request
print(f"Served by: {response.model}")
Provider Fallback (Same Model, Different Providers)
# Route to specific providers in priority order
response = client.chat.completions.create(
model="anthropic/claude-3.5-sonnet",
messages=[{"role": "user", "content": "Hello"}],
max_tokens=200,
extra_body={
"provider": {
"order": ["Anthropic", "AWS Bedrock", "GCP Vertex"],
"allow_fallbacks": True, # Fall to next provider if first fails
},
},
)
Client-Side Fallback Chain
import logging
from openai import OpenAI, APIError, APITimeoutError
log = logging.getLogger("openrouter.fallback")
FALLBACK_CHAIN = [
{"model": "anthropic/claude-3.5-sonnet", "timeout": 30.0, "label": "primary"},
{"model": "openai/gpt-4o", "timeout": 25.0, "label": "secondary"},
{"model": "openai/gpt-4o-mini", "timeout": 15.0, "label": "budget-fallback"},
{"model": "google/gemini-2.0-flash-001", "timeout": 15.0, "label": "last-resort"},
]
def resilient_completion(messages: list[dict], max_tokens: int = 1024, **kwargs):
"""Try each model in the fallback chain until one succeeds."""
last_error = None
for config in FALLBACK_CHAIN:
try:
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=os.environ["OPENROUTER_API_KEY"],
timeout=config["timeout"],
default_headers={"HTTP-Referer": "https://my-app.com", "X-Title": "my-app"},
)
response = client.chat.completions.create(
model=config["model"],
messages=messages,
max_tokens=max_tokens,
**kwargs,
)
log.info(f"Served by {config['label']}: {response.model}")
return response
except (APIError, APITimeoutError) as e:
last_error = e
log.warning(f"{config['label']} failed ({config['model']}): {e}")
continue
raise RuntimeError(f"All fallbacks exhausted. Last error: {last_error}")
Fallback with Capability Matching
# Different models support different features. Match capabilities.
CAPABILITY_CHAINS = {
"tool_calling": [
"anthropic/claude-3.5-sonnet",
"openai/gpt-4o",
"openai/gpt-4o-mini",
],
"vision": [
"openai/gpt-4o",
"anthropic/claude-3.5-sonnet",
"google/gemini-2.0-flash-001",
],
"long_context": [
"google/gemini-2.0-flash-001", # 1M context
"anthropic/claude-3.5-sonnet", # 200K context
"openai/gpt-4o", # 128K context
],
"budget": [
"openai/gpt-4o-mini",
"meta-llama/llama-3.1-8b-instruct",
"google/gemma-2-9b-it:free",
],
}
def capability_fallback(messages, capability="tool_calling", **kwargs):
"""Select fallback chain based on required capability."""
chain = CAPABILITY_CHAINS.get(capability, CAPABILITY_CHAINS["tool_calling"])
return resilient_completion(messages, **kwargs) # Uses FALLBACK_CHAIN
Testing Fallbacks
# Test with an invalid model to trigger fallback
curl -s https://openrouter.ai/api/v1/chat/completions \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "invalid/model-name",
"messages": [{"role": "user", "content": "test"}],
"max_tokens": 10,
"models": ["invalid/model-name", "openai/gpt-4o-mini"],
"route": "fallback"
}' | jq '{model: .model, content: .choices[0].message.content}'
# Should succeed with openai/gpt-4o-mini
Error Handling
| Error | Cause | Fix | |-------|-------|-----| | All fallbacks exhausted | Every model in chain failed | Add more diverse providers; alert on full chain failure | | Slow cascade | Each model timing out sequentially | Reduce per-model timeout to 10-15s | | Inconsistent responses | Different models have different capabilities | Ensure all fallback models support features your prompt uses | | Wrong model served | Fallback triggered unexpectedly | Log which model served each request; check primary model health |
Enterprise Considerations
- Use server-side fallback (
models+route: "fallback") for simplicity; client-side for fine-grained control - Set per-model timeouts -- expensive models get longer timeouts, budget fallbacks get shorter
- Log which model served each request to track fallback frequency (indicates primary model issues)
- Test fallback chains regularly by intentionally failing the primary model
- Match fallback models by capability (tool calling, vision, context length) to avoid silent feature degradation
- Use
provider.orderwhen you need the same model from a different provider (e.g., Claude via Anthropic direct vs AWS Bedrock)