Agent Skills: Invoking Gemini

Invokes Google Gemini models for structured outputs, multi-modal tasks, and Google-specific features. Use when users request Gemini, structured JSON output, Google API integration, or cost-effective parallel processing.

UncategorizedID: oaustegard/claude-skills/invoking-gemini

Install this agent skill to your local

pnpm dlx add-skill https://github.com/oaustegard/claude-skills/tree/HEAD/invoking-gemini

Skill Files

Browse the full folder contents for invoking-gemini.

Download Skill

Loading file tree…

invoking-gemini/SKILL.md

Skill Metadata

Name
invoking-gemini
Description
Invokes Google Gemini models for structured outputs, image generation, multi-modal tasks, and Google-specific features. Use when users request Gemini, image generation, structured JSON output, Google API integration, or cost-effective parallel processing.

Invoking Gemini

Delegate tasks to Google's Gemini models when they offer advantages over Claude.

When to Use Gemini

Image generation:

  • Blog header images, illustrations, diagrams
  • Style-guided image creation (risograph, editorial, etc.)
  • Text rendering in images

Structured outputs:

  • JSON Schema validation with property ordering guarantees
  • Pydantic model compliance
  • Strict schema adherence (enum values, required fields)

Cost optimization:

  • Parallel batch processing (Gemini 3 Flash is lightweight)
  • High-volume simple tasks

Multi-modal tasks:

  • Image analysis with JSON output
  • Video processing
  • Audio transcription with structure

Setup

uv pip install requests pydantic

Credentials — Option A (recommended): Cloudflare AI Gateway

Source /mnt/project/proxy.env with CF_ACCOUNT_ID, CF_GATEWAY_ID, CF_API_TOKEN. Requests route through Cloudflare AI Gateway, bypassing IP blocks. Google API key stored in gateway via BYOK.

Credentials — Option B: Direct Google API

If no proxy.env, falls back to direct: GOOGLE_API_KEY.txt or API_CREDENTIALS.json.

Image Generation

Generate images using Gemini's native image models. This is the primary way to create illustrations, blog headers, diagrams, and visual content.

Quick Start

import sys
sys.path.append('/mnt/skills/user/invoking-gemini/scripts')
from gemini_client import generate_image

# One call — returns {"path": "...", "caption": "..."} or None
result = generate_image("A watercolor painting of a mountain lake at sunset")
print(result["path"])  # /mnt/user-data/outputs/gemini_image_1740000000.png

Function Signature

generate_image(
    prompt: str,                    # The image description
    output_path: str = None,        # Auto-generates if omitted
    model: str = "nano-banana-2",   # Default: fast. Use "image-pro" for quality
    temperature: float = 0.7,       # 0.5-0.7 for diagrams, 0.7-0.8 for illustrations
) -> dict | None
# Returns: {"path": "/mnt/user-data/outputs/gemini_image_*.png", "caption": str|None}
# Returns None on failure

Model Selection

| Alias | Model | Best For | Cost/image | |-------|-------|----------|------------| | "nano-banana-2" or "image" | gemini-3.1-flash-image-preview | Fast iteration, drafts | $0.067 | | "image-pro" or "nano-banana-pro" | gemini-3-pro-image-preview | Published content, text rendering | $0.134 |

Complete Blog Header Example

import sys
sys.path.append('/mnt/skills/user/invoking-gemini/scripts')
from gemini_client import generate_image

# 1. Compose prompt with style prefix + subject
style_prefix = (
    "Style: Risograph-inspired editorial illustration. "
    "Visible halftone dot texture and slight color misregistration between layers. "
    "Limited ink palette: deep indigo, warm coral, and sage green on off-white paper. "
    "Layered transparency where colors overlap creates rich secondary tones. "
    "Modern and professional — the aesthetic of an indie design studio, not a fantasy novel. "
    "Generous whitespace. No photorealism, no glow effects, no cyberpunk. No text or labels."
)
subject = "A raven perched on a stack of books, observing a network graph"
prompt = f"{style_prefix}\n\nSubject: {subject}. Wide landscape format, suitable as a blog header."

# 2. Generate (use image-pro for published content)
result = generate_image(prompt, model="image-pro", temperature=0.75)

if result:
    print(f"Saved: {result['path']}")
    # 3. Present to user
    # present_files([result["path"]])

Prompt Patterns

  • Style prefix + subject: Prepend a style description, then describe the subject
  • Be specific about style: "Risograph-inspired editorial illustration" not "a nice picture"
  • Include composition: "Wide landscape format" / "centered, high contrast"
  • Text rendering: "A poster with the text 'SALE' in bold red letters" (works well with image-pro)
  • Negative constraints: "No photorealism, no glow effects" to avoid defaults

Custom Output Path

result = generate_image(
    "A logo for a coffee shop called 'Bean There'",
    output_path="/mnt/user-data/outputs/coffee_logo.png"
)

Basic Text Usage

import sys
sys.path.append('/mnt/skills/user/invoking-gemini/scripts')
from gemini_client import invoke_gemini

response = invoke_gemini(
    prompt="Explain quantum computing in 3 bullet points",
    model="flash",  # gemini-3.5-flash (default)
)
print(response)

Structured Output

Use Pydantic models for guaranteed JSON Schema compliance:

from gemini_client import invoke_with_structured_output
from pydantic import BaseModel, Field

class BookAnalysis(BaseModel):
    title: str
    genre: str = Field(description="Primary genre")
    key_themes: list[str] = Field(max_length=5)
    rating: int = Field(ge=1, le=5)

result = invoke_with_structured_output(
    prompt="Analyze the book '1984' by George Orwell",
    pydantic_model=BookAnalysis
)
print(result.title)  # "1984"

Parallel Invocation

from gemini_client import invoke_parallel

results = invoke_parallel(
    prompts=["Summarize Hamlet", "Summarize Macbeth", "Summarize Othello"],
    model="lite",  # gemini-2.5-flash-lite — cheapest, fastest for batch
)

Available Models

All current Gemini 3.x text/multimodal models are in preview except 3.5 Flash (GA May 19, 2026). Use the values below — gemini-3-flash-preview and gemini-3.1-flash-lite-preview from earlier docs are out of date.

Text / Reasoning Models

| Model | Alias | Input/1M | Output/1M | Context | Notes | |-------|-------|----------|-----------|---------|-------| | gemini-3.5-flash | flash | $1.50 | $9.00 | 1M | GA May 2026. Frontier Flash. Beats 3.1 Pro on most coding/agentic benchmarks. Default thinking_level=medium — set minimal for non-reasoning tasks. | | gemini-3-flash-preview | flash-3 | $0.30 | $2.50 | 1M | Prior-gen Flash, kept for back compat | | gemini-3.1-pro-preview | pro | $2.00 (≤200K) / $4.00 | $12.00 / $24.00 | 1M | Current Pro tier; 3.5 Pro slated for June 2026 | | gemini-2.5-flash | stable-flash | $0.30 | $2.50 | 1M | Stable production Flash | | gemini-2.5-flash-lite | lite | $0.10 | $0.40 | 1M | Cheapest major-provider production model. Surprisingly strong on multimodal extraction. | | gemini-2.5-pro | stable-pro | $1.25 (≤200K) / $2.50 | $10.00 / $20.00 | 1M | Stable production Pro |

Image Models

| Model | Alias | Input/1M | Per Image | |-------|-------|----------|-----------| | gemini-3.1-flash-image-preview | image, nano-banana-2 | $0.25 | $0.067 | | gemini-3-pro-image-preview | image-pro, nano-banana-pro | $2.00 | $0.134 |

See references/models.md for full details.

Thinking Budget (Gemini 3.x)

Gemini 3.x models reason before responding. The parameter changed in 2026: integer thinking_budget is gone; use string thinking_level ∈ {minimal, low, medium, high}. Default for 3.5 Flash is medium. For transcription / classification / extraction tasks, pass thinking_level='minimal' or the model will silently spend output tokens on reasoning (symptom: empty response with finishReason=MAX_TOKENS).

response = invoke_gemini(
    prompt="Transcribe this image.",
    model="flash",
    image_path="/tmp/screenshot.png",
    max_output_tokens=4000,
    thinking_level="minimal",  # don't burn output budget on reasoning
)

Error Handling

response = invoke_gemini(prompt="...", model="flash")
if response is None:
    print("API call failed — check credentials")

result = generate_image("...")
if result is None:
    print("Image generation failed — check credentials or try again")

Common issues: Missing API key → see Setup. Rate limit → auto-retries with backoff. Network error → returns None.

Advanced Features

Custom Generation Config

response = invoke_gemini(
    prompt="Write a haiku",
    model="flash",                  # gemini-3.5-flash
    temperature=0.9,
    max_output_tokens=200,
    top_p=0.95,
    thinking_level="low",           # haiku is short; modest reasoning is fine
)

Multi-modal Input

from pydantic import BaseModel
from gemini_client import invoke_with_structured_output

class ImageDescription(BaseModel):
    objects: list[str]
    scene: str
    colors: list[str]

result = invoke_with_structured_output(
    prompt="Describe this image",
    pydantic_model=ImageDescription,
    image_path="/mnt/user-data/uploads/photo.jpg"
)

See references/advanced.md for more patterns.

Troubleshooting

"No credentials configured": Create /mnt/project/proxy.env with CF credentials, or add GOOGLE_API_KEY.txt.

CF Gateway 401/403: Verify CF_API_TOKEN has AI Gateway permissions. If not using BYOK, add GOOGLE_API_KEY to proxy.env.

Import errors: uv pip install requests pydantic

Image generation returns None: Check credentials. If persistent, try model="nano-banana-2" (more reliable than image-pro). Check for content policy blocks in error output.