Agent Skills: Multi-Provider Patterns and Raw SDK Access

Use when switching between LLM providers, accessing provider-specific features (Anthropic caching, OpenAI logprobs), or using raw SDK clients - covers multi-provider patterns and direct SDK access for OpenAI, Anthropic, Google, and Ollama

UncategorizedID: juanre/llmring/providers

Skill Files

Browse the full folder contents for providers.

Download Skill

Loading file tree…

skills/providers/SKILL.md

Skill Metadata

Name
providers
Description
Use when switching between LLM providers, accessing provider-specific features (Anthropic caching, OpenAI logprobs), or using raw SDK clients - covers multi-provider patterns and direct SDK access for OpenAI, Anthropic, Google, and Ollama

Multi-Provider Patterns and Raw SDK Access

Installation

# With uv (recommended)
uv add llmring

# With pip
pip install llmring

Provider SDKs (install what you need):

uv add openai>=1.0      # OpenAI
uv add anthropic>=0.67   # Anthropic
uv add google-genai      # Google Gemini
uv add ollama>=0.4       # Ollama

API Overview

This skill covers:

  • get_provider() method for raw SDK access
  • Provider initialization and configuration
  • Provider-specific features (caching, extra parameters)
  • Multi-provider patterns and switching
  • Fallback behavior

Quick Start

from llmring import LLMRing, LLMRequest, Message

async with LLMRing() as service:
    # Get raw provider client
    openai_client = service.get_provider("openai").client
    anthropic_client = service.get_provider("anthropic").client

    # Use provider SDK directly
    response = await openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello"}],
        logprobs=True  # Provider-specific feature
    )

Complete API Documentation

LLMRing.get_provider()

Get raw provider client for direct SDK access.

Signature:

def get_provider(provider_type: str) -> BaseLLMProvider

Parameters:

  • provider_type (str): Provider name - "openai", "anthropic", "google", or "ollama"

Returns:

  • BaseLLMProvider: Provider wrapper with .client attribute for raw SDK

Raises:

  • ProviderNotFoundError: If provider not configured or API key missing

Example:

from llmring import LLMRing

async with LLMRing() as service:
    # Get providers
    openai_provider = service.get_provider("openai")
    anthropic_provider = service.get_provider("anthropic")

    # Access raw clients
    openai_client = openai_provider.client      # openai.AsyncOpenAI
    anthropic_client = anthropic_provider.client # anthropic.AsyncAnthropic

Provider Clients

Each provider exposes its native SDK client:

OpenAI:

provider = service.get_provider("openai")
client = provider.client  # openai.AsyncOpenAI instance

Anthropic:

provider = service.get_provider("anthropic")
client = provider.client  # anthropic.AsyncAnthropic instance

Google:

provider = service.get_provider("google")
client = provider.client  # google.genai.Client instance

Ollama:

provider = service.get_provider("ollama")
client = provider.client  # ollama.AsyncClient instance

Provider Initialization

Providers are automatically initialized based on environment variables:

Environment Variables:

# OpenAI
OPENAI_API_KEY=sk-...

# Anthropic
ANTHROPIC_API_KEY=sk-ant-...

# Google (any of these)
GOOGLE_GEMINI_API_KEY=AIza...
GEMINI_API_KEY=AIza...
GOOGLE_API_KEY=AIza...

# Ollama (optional, default shown)
OLLAMA_BASE_URL=http://localhost:11434

What gets initialized:

  • OpenAI: If OPENAI_API_KEY is set
  • Anthropic: If ANTHROPIC_API_KEY is set
  • Google: If any Google API key is set
  • Ollama: Always (local, no key needed)

Provider-Specific Features

OpenAI: Logprobs and Advanced Parameters

from llmring import LLMRing

async with LLMRing() as service:
    openai_client = service.get_provider("openai").client

    # Use OpenAI-specific features
    response = await openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello"}],
        logprobs=True,           # Token probabilities
        top_logprobs=5,          # Top 5 alternatives
        seed=12345,              # Deterministic sampling
        presence_penalty=0.1,    # Reduce repetition
        frequency_penalty=0.2,   # Reduce frequency
        parallel_tool_calls=False  # Sequential tools
    )

    # Access logprobs
    if response.choices[0].logprobs:
        for token_info in response.choices[0].logprobs.content:
            print(f"Token: {token_info.token}, prob: {token_info.logprob}")

OpenAI: Reasoning Models (o1 series)

from llmring import LLMRing, LLMRequest, Message

async with LLMRing() as service:
    # Using unified API
    request = LLMRequest(
        model="openai:o1",
        messages=[Message(role="user", content="Complex reasoning task")],
        reasoning_tokens=10000  # Budget for internal reasoning
    )

    response = await service.chat(request)

    # Or use raw SDK
    openai_client = service.get_provider("openai").client
    response = await openai_client.chat.completions.create(
        model="o1",
        messages=[{"role": "user", "content": "Reasoning task"}],
        max_completion_tokens=5000  # Includes reasoning + output tokens
    )

Anthropic: Prompt Caching

from llmring import LLMRing, LLMRequest, Message

async with LLMRing() as service:
    # Using unified API
    request = LLMRequest(
        model="anthropic:claude-sonnet-4-5-20250929",
        messages=[
            Message(
                role="system",
                content="Very long system prompt with 1024+ tokens...",
                metadata={"cache_control": {"type": "ephemeral"}}
            ),
            Message(role="user", content="Hello")
        ]
    )

    response = await service.chat(request)

    # Or use raw SDK
    anthropic_client = service.get_provider("anthropic").client
    response = await anthropic_client.messages.create(
        model="claude-sonnet-4-5-20250929",
        max_tokens=100,
        system=[{
            "type": "text",
            "text": "Long system prompt...",
            "cache_control": {"type": "ephemeral"}
        }],
        messages=[{"role": "user", "content": "Hello"}]
    )

    # Check cache usage
    print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")
    print(f"Cache creation tokens: {response.usage.cache_creation_input_tokens}")

Anthropic: Extended Thinking

Extended thinking can be enabled via extra_params:

from llmring import LLMRing, LLMRequest, Message

async with LLMRing() as service:
    # Using unified API with extra_params
    request = LLMRequest(
        model="anthropic:claude-sonnet-4-5-20250929",
        messages=[Message(role="user", content="Complex reasoning problem...")],
        max_tokens=16000,
        extra_params={
            "thinking": {
                "type": "enabled",
                "budget_tokens": 10000
            }
        }
    )

    response = await service.chat(request)

    # Response may contain thinking content (check response structure)

# Or use raw SDK for full control
async with LLMRing() as service:
    anthropic_client = service.get_provider("anthropic").client

    response = await anthropic_client.messages.create(
        model="claude-sonnet-4-5-20250929",
        max_tokens=16000,
        thinking={
            "type": "enabled",
            "budget_tokens": 10000
        },
        messages=[{
            "role": "user",
            "content": "Complex reasoning problem..."
        }]
    )

    # Access thinking content
    for block in response.content:
        if block.type == "thinking":
            print(f"Thinking: {block.thinking}")
        elif block.type == "text":
            print(f"Response: {block.text}")

Note: The unified API's reasoning_tokens parameter is for OpenAI reasoning models (o1, o3). For Anthropic extended thinking, use extra_params as shown above.

Google: Large Context and Multimodal

from llmring import LLMRing

async with LLMRing() as service:
    google_client = service.get_provider("google").client

    # Use 2M+ token context
    response = google_client.models.generate_content(
        model="gemini-2.5-pro",
        contents="Very long document with millions of tokens...",
        generation_config={
            "temperature": 0.7,
            "top_p": 0.8,
            "top_k": 40,
            "candidate_count": 1,
            "max_output_tokens": 8192
        }
    )

    # Multimodal (vision)
    from PIL import Image
    img = Image.open("image.jpg")

    response = google_client.models.generate_content(
        model="gemini-2.5-flash",
        contents=["What's in this image?", img]
    )

Ollama: Local Models and Custom Options

from llmring import LLMRing

async with LLMRing() as service:
    ollama_client = service.get_provider("ollama").client

    # Use local model with custom options
    response = await ollama_client.chat(
        model="llama3",
        messages=[{"role": "user", "content": "Hello"}],
        options={
            "temperature": 0.8,
            "top_k": 40,
            "top_p": 0.9,
            "num_predict": 256,
            "num_ctx": 4096,
            "repeat_penalty": 1.1
        }
    )

    # List available local models
    models = await ollama_client.list()
    for model in models["models"]:
        print(f"Model: {model['name']}, Size: {model['size']}")

Using extra_params

For provider-specific parameters via unified API:

from llmring import LLMRing, LLMRequest, Message

async with LLMRing() as service:
    # Pass provider-specific params
    request = LLMRequest(
        model="openai:gpt-4o",
        messages=[Message(role="user", content="Hello")],
        extra_params={
            "logprobs": True,
            "top_logprobs": 5,
            "seed": 12345,
            "presence_penalty": 0.1
        }
    )

    response = await service.chat(request)

Multi-Provider Patterns

Provider Switching

from llmring import LLMRing, LLMRequest, Message

async with LLMRing() as service:
    # Same request, different providers
    messages = [Message(role="user", content="Hello")]

    # OpenAI
    response = await service.chat(
        LLMRequest(model="openai:gpt-4o", messages=messages)
    )

    # Anthropic
    response = await service.chat(
        LLMRequest(model="anthropic:claude-sonnet-4-5-20250929", messages=messages)
    )

    # Google
    response = await service.chat(
        LLMRequest(model="google:gemini-2.5-pro", messages=messages)
    )

    # Ollama
    response = await service.chat(
        LLMRequest(model="ollama:llama3", messages=messages)
    )

Automatic Fallback

Use lockfile for automatic provider failover:

# llmring.lock
[[profiles.default.bindings]]
alias = "reliable"
models = [
    "anthropic:claude-sonnet-4-5-20250929",  # Try first
    "openai:gpt-4o",                 # If rate limited
    "google:gemini-2.5-pro",         # If both fail
    "ollama:llama3"                  # Local fallback
]
from llmring import LLMRing, LLMRequest, Message

async with LLMRing() as service:
    # Automatically tries fallbacks on failure
    request = LLMRequest(
        model="reliable",  # Uses fallback chain
        messages=[Message(role="user", content="Hello")]
    )

    response = await service.chat(request)
    print(f"Used model: {response.model}")

Cost Optimization: Try Cheaper First

from llmring import LLMRing, LLMRequest, Message

async with LLMRing() as service:
    messages = [Message(role="user", content="Simple task")]

    # Try cheap model first
    try:
        response = await service.chat(
            LLMRequest(model="openai:gpt-4o-mini", messages=messages)
        )
    except Exception as e:
        # Fall back to more capable model
        response = await service.chat(
            LLMRequest(model="anthropic:claude-sonnet-4-5-20250929", messages=messages)
        )

Provider-Specific Error Handling

from llmring import LLMRing, LLMRequest, Message
from llmring.exceptions import (
    ProviderRateLimitError,
    ProviderAuthenticationError,
    ModelNotFoundError
)

async with LLMRing() as service:
    try:
        request = LLMRequest(
            model="anthropic:claude-sonnet-4-5-20250929",
            messages=[Message(role="user", content="Hello")]
        )
        response = await service.chat(request)

    except ProviderRateLimitError as e:
        print(f"Rate limited, retry after {e.retry_after}s")
        # Try different provider
        request.model = "openai:gpt-4o"
        response = await service.chat(request)

    except ProviderAuthenticationError:
        print("Invalid API key")

    except ModelNotFoundError:
        print("Model not available")

Provider Comparison

| Provider | Strengths | Limitations | Best For | |----------|-----------|-------------|----------| | OpenAI | Fast, reliable, reasoning models (o1) | Rate limits, cost | General purpose, reasoning | | Anthropic | Large context, prompt caching, extended thinking | Availability varies by region | Complex tasks, large docs | | Google | 2M+ context, multimodal, fast | Newer, less documentation | Large context, vision | | Ollama | Local, free, privacy | Requires local setup, slower | Development, privacy |

When to Use Raw SDK Access

Use unified LLMRing API when:

  • Switching between providers
  • Using aliases and profiles
  • Standard chat/streaming/tools
  • Want provider abstraction

Use raw SDK access when:

  • Need provider-specific features not in unified API
  • Performance-critical applications
  • Complex provider-specific configurations
  • Vendor-specific optimizations

Common Mistakes

Wrong: Not Checking Provider Availability

# DON'T DO THIS - provider may not be configured
provider = service.get_provider("anthropic")
client = provider.client  # May error if no API key!

Right: Check Provider Availability

# DO THIS - handle missing providers
from llmring.exceptions import ProviderNotFoundError

try:
    provider = service.get_provider("anthropic")
    client = provider.client
except ProviderNotFoundError:
    print("Anthropic not configured - check ANTHROPIC_API_KEY")

Wrong: Hardcoding Provider

# DON'T DO THIS - locked to one provider
request = LLMRequest(
    model="openai:gpt-4o",
    messages=[...]
)

Right: Use Alias for Flexibility

# DO THIS - easy to switch providers
request = LLMRequest(
    model="assistant",  # Your semantic alias defined in lockfile
    messages=[...]
)

Wrong: Ignoring Provider-Specific Errors

# DON'T DO THIS - generic error handling
try:
    response = await service.chat(request)
except Exception as e:
    print(f"Error: {e}")

Right: Handle Provider-Specific Errors

# DO THIS - specific error types
from llmring.exceptions import (
    ProviderRateLimitError,
    ProviderTimeoutError
)

try:
    response = await service.chat(request)
except ProviderRateLimitError as e:
    # Try different provider
    request.model = "google:gemini-2.5-pro"
    response = await service.chat(request)
except ProviderTimeoutError:
    # Retry or use different provider
    pass

Best Practices

  1. Use aliases for flexibility: Don't hardcode provider:model references
  2. Configure fallbacks: Multiple providers in lockfile for high availability
  3. Check provider availability: Handle ProviderNotFoundError
  4. Use unified API when possible: Only drop to raw SDK when needed
  5. Handle provider-specific errors: Different providers have different failure modes
  6. Test with multiple providers: Ensure your code works across providers
  7. Document provider choices: Explain why you chose specific providers

Checking Available Providers

from llmring import LLMRing

async with LLMRing() as service:
    # Check which providers are configured
    providers = []
    for provider_name in ["openai", "anthropic", "google", "ollama"]:
        try:
            service.get_provider(provider_name)
            providers.append(provider_name)
        except:
            pass

    print(f"Available providers: {', '.join(providers)}")

Related Skills

  • llmring-chat - Basic chat with unified API
  • llmring-streaming - Streaming across providers
  • llmring-tools - Tools with different providers
  • llmring-structured - Structured output across providers
  • llmring-lockfile - Configure provider aliases and fallbacks

Summary

Multi-provider patterns enable:

  • High availability (automatic failover)
  • Cost optimization (try cheaper first)
  • Provider diversity (avoid vendor lock-in)
  • Feature access (use best provider for each task)

Raw SDK access provides:

  • Provider-specific features (logprobs, caching, etc.)
  • Performance optimizations
  • Advanced configurations
  • Direct vendor SDK control

Recommendation: Use unified API with aliases for most work. Drop to raw SDK only when you need provider-specific features.