LLM Gateway
A unified interface for working with multiple LLM providers through a factory pattern, YAML-driven configuration, and provider-specific client implementations.
Key Capabilities
- Factory Pattern: Create provider-specific clients through a unified gateway
- YAML Configuration: Centralized model definitions with features, pricing, and context limits
- Provider Abstraction: Common interface across OpenAI, Anthropic, Gemini, Ollama, etc.
- Feature Detection: Query model capabilities (vision, function_calling, embeddings)
- Standardized Response: Consistent response structure across all providers
- Error Handling: Structured errors with retry logic for rate limits
- Embeddings Support: Unified embeddings API with batch support
Architecture
┌─────────────────────┐
│ LLMGateway │
│ (Factory Class) │
└─────────┬───────────┘
│ create(provider:, api_key:)
▼
┌─────────────────────┐
│ LLMConfig │
│ (YAML Loader) │
└─────────┬───────────┘
│
┌───────────┬─────────┴─────────┬───────────┐
▼ ▼ ▼ ▼
┌─────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐
│ OpenAI │ │Anthropic │ ... │ Gemini │ │ Ollama │
│ Client │ │ Client │ │ Client │ │ Client │
└────┬────┘ └────┬─────┘ └────┬─────┘ └────┬────┘
│ │ │ │
└───────────┴────────┬─────────┴────────────┘
▼
┌─────────────────────┐
│ LLMClient │
│ (Base Class) │
│ - LLMResponse │
│ - FEATURES │
│ - Retry Logic │
└─────────────────────┘
Quick Start
# 1. Create a client through the gateway
client = LLMGateway.create(provider: :openai, api_key: ENV['OPENAI_API_KEY'])
# 2. Send a message
response = client.create_message(
system: "You are a helpful assistant",
model: "gpt-4o",
limit: 1000,
messages: [{ role: "user", content: "Hello!" }]
)
# 3. Access the standardized response
puts response.content # "Hello! How can I help you?"
puts response.finish_reason # "stop"
puts response.usage # { "prompt_tokens" => 10, "completion_tokens" => 8 }
Core Components
Factory Pattern
# Simple creation
client = LLMGateway.create(provider: :anthropic, api_key: api_key)
# Model-aware creation (routes to correct API variant)
client = LLMGateway.create_for_model(
provider: :openai,
model_name: "gpt-4o",
api_key: api_key
)
Configuration System
# config/llm_models.yml
providers:
openai:
name: OpenAI
client_class: OpenAIClient
models:
gpt-4o:
model: gpt-4o
features: [vision, function_calling, multimodal]
context_length: 128000
pricing: { input: 0.0025, output: 0.01 }
Feature Detection
# Check model features
LLMConfig.supports_vision?(:openai, "gpt-4o") # => true
LLMConfig.supports_function_calling?(:openai, "gpt-4o") # => true
# Find models by feature
LLMConfig.models_with_feature(:embeddings)
LLMConfig.cheapest_model_with_features(:openai, ["vision", "function_calling"])
When to Use This Pattern
Ideal for:
- Applications requiring multiple LLM providers
- Cost optimization through model selection
- Feature-based routing (e.g., vision-capable models)
- Consistent error handling across providers
Consider alternatives if:
- Single provider only (use official SDK directly)
- Streaming-only workloads (add streaming layer)
- Very high throughput (consider async patterns)
Output Checklist
When implementation is complete, verify:
- [ ] LLMGateway creates correct client for each provider
- [ ] Client whitelist prevents unsafe reflection attacks
- [ ] YAML config loads with all model metadata (features, pricing, context)
- [ ] Feature detection works:
supports_vision?,supports_function_calling? - [ ] LLMResponse struct returned consistently across all providers
- [ ] Rate limit errors trigger automatic retry with backoff
- [ ] API errors include structured
error_type,error_codefields - [ ] Model sync rake task updates pricing from OpenRouter
- [ ] Usage tracking records tokens and calculates costs
- [ ] Multi-tenant support with per-account API keys
Common Pitfalls
- Unsafe reflection: Always whitelist allowed client classes
- Sending internal metadata to APIs: Filter
pricing,features,descriptionbefore API calls - Provider-specific token params: OpenAI o-series uses
max_completion_tokens, notmax_tokens - Missing feature arrays in seeds: Model config must include
featuresarray for detection - Rate limit without retry: Always implement exponential backoff for 429 responses
- Inconsistent usage keys: Normalize
prompt_tokensvsinput_tokensvspromptTokenCount - Price conversion errors: OpenRouter returns per-token, config expects per-1K tokens
Testing Notes
Gateway Testing
- Test client creation for each provider
- Verify whitelist rejects unknown client classes
- Test model-aware routing (e.g., OpenAI responses vs chat API)
Client Testing
- Test standardized LLMResponse across providers
- Verify message formatting for each provider's API format
- Test tool call formatting (different for OpenAI vs Anthropic vs Gemini)
Error Handling Testing
- Test rate limit detection and retry
- Verify structured APIError with type/code
- Test timeout handling
Configuration Testing
- Test YAML loading and caching
- Verify feature detection methods
- Test
cheapest_model_with_featuresselection
Integration Testing
- End-to-end request through gateway
- Usage tracking and cost calculation
- Multi-tenant isolation
References
Detailed implementation guides:
- Gateway and Factory - LLMGateway and client creation
- Base Client - LLMClient abstract class
- Configuration System - YAML-driven model config
- Provider Implementations - OpenAI, Anthropic, Gemini
- Error Handling - Structured errors and retry logic
- Model Sync - Syncing models from provider APIs
- Usage and Cost - Token tracking and cost calculation
- Rails Adapter - Rails-specific integration patterns