Agent Skills: Gemini 3 Pro API Integration

Gemini 3 Pro API/SDK integration for text generation, reasoning, and chat. Covers setup, authentication, thinking levels, streaming, and production deployment. Use when working with Gemini 3 Pro API, Python SDK, Node.js SDK, text generation, chat applications, or advanced reasoning tasks.

UncategorizedID: adaptationio/skrillz/gemini-3-pro-api

Install this agent skill to your local

pnpm dlx add-skill https://github.com/adaptationio/Skrillz/tree/HEAD/.claude/skills/gemini-3-pro-api

Skill Files

Browse the full folder contents for gemini-3-pro-api.

Download Skill

Loading file tree…

.claude/skills/gemini-3-pro-api/SKILL.md

Skill Metadata

Name
gemini-3-pro-api
Description
Gemini 3 Pro API/SDK integration for text generation, reasoning, and chat. Covers setup, authentication, thinking levels, streaming, and production deployment. Use when working with Gemini 3 Pro API, Python SDK, Node.js SDK, text generation, chat applications, or advanced reasoning tasks.

Gemini 3 Pro API Integration

Comprehensive guide for integrating Google's Gemini 3 Pro API/SDK into your applications. Covers setup, authentication, text generation, advanced reasoning with dynamic thinking, chat applications, streaming responses, and production deployment patterns.

Overview

Gemini 3 Pro (gemini-3-pro-preview) is Google's most intelligent model designed for complex tasks requiring advanced reasoning and broad world knowledge. This skill provides complete workflows for API integration using Python or Node.js SDKs.

Key Capabilities

  • Massive Context: 1M token input, 64k token output
  • Dynamic Thinking: Adaptive reasoning with high/low modes
  • Streaming: Real-time token delivery
  • Chat: Multi-turn conversations with history
  • Production-Ready: Error handling, retry logic, cost optimization

When to Use This Skill

  • Setting up Gemini 3 Pro API access
  • Building text generation applications
  • Implementing chat applications with reasoning
  • Configuring advanced thinking modes
  • Deploying production Gemini applications
  • Optimizing API usage and costs

Quick Start

Prerequisites

Python Quick Start

# Install SDK
pip install google-genai

# Basic usage
import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemini-3-pro-preview")

response = model.generate_content("Explain quantum computing")
print(response.text)

Node.js Quick Start

// Install SDK
npm install @google/generative-ai

// Basic usage
import { GoogleGenerativeAI } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI("YOUR_API_KEY");
const model = genAI.getGenerativeModel({ model: "gemini-3-pro-preview" });

const result = await model.generateContent("Explain quantum computing");
console.log(result.response.text());

Core Workflows

Workflow 1: Quick Start Setup

Goal: Get from zero to first successful API call in < 5 minutes.

Steps:

  1. Get API Key

    • Visit Google AI Studio
    • Create or select project
    • Generate API key
    • Copy key securely
  2. Install SDK

    # Python
    pip install google-genai
    
    # Node.js
    npm install @google/generative-ai
    
  3. Configure Authentication

    # Python - using environment variable (recommended)
    import os
    import google.generativeai as genai
    
    genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
    
    // Node.js - using environment variable (recommended)
    const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);
    
  4. Make First API Call

    # Python
    model = genai.GenerativeModel("gemini-3-pro-preview")
    response = model.generate_content("Write a haiku about coding")
    print(response.text)
    
  5. Verify Success

    • Check response received
    • Verify text output
    • Note token usage
    • Confirm API key working

Expected Outcome: Working API integration in under 5 minutes.


Workflow 2: Chat Application Development

Goal: Build a production-ready chat application with conversation history and streaming.

Steps:

  1. Initialize Chat Model

    # Python
    model = genai.GenerativeModel(
        "gemini-3-pro-preview",
        generation_config={
            "thinking_level": "high",  # Dynamic reasoning
            "temperature": 1.0,  # Keep at 1.0 for best results
            "max_output_tokens": 8192
        }
    )
    
  2. Start Chat Session

    chat = model.start_chat(history=[])
    
  3. Send Message with Streaming

    response = chat.send_message(
        "Explain how neural networks learn",
        stream=True
    )
    
    # Stream tokens in real-time
    for chunk in response:
        print(chunk.text, end="", flush=True)
    
  4. Manage Conversation History

    # History is automatically maintained
    # Access it anytime
    print(f"Conversation turns: {len(chat.history)}")
    
    # Continue conversation
    response = chat.send_message("Can you give an example?")
    
  5. Handle Thought Signatures

    • SDKs handle automatically in standard chat flows
    • No manual intervention needed for basic use
    • See references/thought-signatures.md for advanced cases
  6. Implement Error Handling

    import time
    from google.api_core import retry, exceptions
    
    @retry.Retry(predicate=retry.if_exception_type(
        exceptions.ResourceExhausted,
        exceptions.ServiceUnavailable
    ))
    def send_with_retry(chat, message):
        return chat.send_message(message)
    
    try:
        response = send_with_retry(chat, user_input)
    except exceptions.GoogleAPIError as e:
        print(f"API error: {e}")
    

Expected Outcome: Production-ready chat application with streaming, history, and error handling.


Workflow 3: Production Deployment

Goal: Deploy Gemini 3 Pro integration with monitoring, cost control, and reliability.

Steps:

  1. Setup Authentication (Production)

    # Use environment variables (never hardcode keys)
    import os
    from pathlib import Path
    
    # Option 1: Environment variable
    api_key = os.getenv("GEMINI_API_KEY")
    
    # Option 2: Secrets manager (recommended for production)
    # Use Google Secret Manager, AWS Secrets Manager, etc.
    
  2. Configure Production Settings

    model = genai.GenerativeModel(
        "gemini-3-pro-preview",
        generation_config={
            "thinking_level": "high",  # or "low" for simple tasks
            "temperature": 1.0,  # CRITICAL: Keep at 1.0
            "max_output_tokens": 4096,
            "top_p": 0.95,
            "top_k": 40
        },
        safety_settings={
            # Configure content filtering as needed
        }
    )
    
  3. Implement Comprehensive Error Handling

    from google.api_core import exceptions, retry
    import logging
    
    logging.basicConfig(level=logging.INFO)
    logger = logging.getLogger(__name__)
    
    def generate_with_fallback(prompt, max_retries=3):
        @retry.Retry(
            predicate=retry.if_exception_type(
                exceptions.ResourceExhausted,
                exceptions.ServiceUnavailable,
                exceptions.DeadlineExceeded
            ),
            initial=1.0,
            maximum=10.0,
            multiplier=2.0,
            deadline=60.0
        )
        def _generate():
            return model.generate_content(prompt)
    
        try:
            return _generate()
        except exceptions.InvalidArgument as e:
            logger.error(f"Invalid argument: {e}")
            raise
        except exceptions.PermissionDenied as e:
            logger.error(f"Permission denied: {e}")
            raise
        except Exception as e:
            logger.error(f"Unexpected error: {e}")
            # Fallback to simpler model or cached response
            return None
    
  4. Monitor Usage and Costs

    def log_usage(response):
        usage = response.usage_metadata
        logger.info(f"Tokens - Input: {usage.prompt_token_count}, "
                    f"Output: {usage.candidates_token_count}, "
                    f"Total: {usage.total_token_count}")
    
        # Estimate cost (for prompts ≤200k tokens)
        input_cost = (usage.prompt_token_count / 1_000_000) * 2.00
        output_cost = (usage.candidates_token_count / 1_000_000) * 12.00
        total_cost = input_cost + output_cost
    
        logger.info(f"Estimated cost: ${total_cost:.6f}")
    
    response = model.generate_content(prompt)
    log_usage(response)
    
  5. Implement Rate Limiting

    import time
    from collections import deque
    
    class RateLimiter:
        def __init__(self, max_requests_per_minute=60):
            self.max_rpm = max_requests_per_minute
            self.requests = deque()
    
        def wait_if_needed(self):
            now = time.time()
            # Remove requests older than 1 minute
            while self.requests and self.requests[0] < now - 60:
                self.requests.popleft()
    
            # Check if at limit
            if len(self.requests) >= self.max_rpm:
                sleep_time = 60 - (now - self.requests[0])
                if sleep_time > 0:
                    time.sleep(sleep_time)
    
            self.requests.append(now)
    
    limiter = RateLimiter(max_requests_per_minute=60)
    
    def generate_with_rate_limit(prompt):
        limiter.wait_if_needed()
        return model.generate_content(prompt)
    
  6. Setup Logging and Monitoring

    import logging
    from datetime import datetime
    
    # Configure logging
    logging.basicConfig(
        level=logging.INFO,
        format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
        handlers=[
            logging.FileHandler('gemini_api.log'),
            logging.StreamHandler()
        ]
    )
    
    logger = logging.getLogger(__name__)
    
    def monitored_generate(prompt):
        start_time = datetime.now()
        try:
            response = model.generate_content(prompt)
            duration = (datetime.now() - start_time).total_seconds()
    
            logger.info(f"Success - Duration: {duration}s, "
                        f"Tokens: {response.usage_metadata.total_token_count}")
            return response
        except Exception as e:
            duration = (datetime.now() - start_time).total_seconds()
            logger.error(f"Failed - Duration: {duration}s, Error: {e}")
            raise
    

Expected Outcome: Production-ready deployment with monitoring, cost control, error handling, and rate limiting.


Thinking Levels

Dynamic Thinking System

Gemini 3 Pro introduces thinking_level to control reasoning depth:

thinking_level: "high" (default)

  • Maximum reasoning depth
  • Best quality for complex tasks
  • Slower first-token response
  • Higher cost
  • Use for: Complex reasoning, coding, analysis, research

thinking_level: "low"

  • Minimal reasoning overhead
  • Faster response
  • Lower cost
  • Simpler output
  • Use for: Simple questions, factual answers, quick queries

Configuration

# Python
model = genai.GenerativeModel(
    "gemini-3-pro-preview",
    generation_config={
        "thinking_level": "high"  # or "low"
    }
)
// Node.js
const model = genAI.getGenerativeModel({
  model: "gemini-3-pro-preview",
  generationConfig: {
    thinking_level: "high"  // or "low"
  }
});

Critical Notes

⚠️ Temperature MUST stay at 1.0 - Changing temperature can cause looping or degraded performance on complex reasoning tasks.

⚠️ Cannot combine thinking_level with legacy thinking_budget parameter.

See references/thinking-levels.md for detailed guide.


Streaming Responses

Python Streaming

response = model.generate_content(
    "Write a long article about AI",
    stream=True
)

for chunk in response:
    print(chunk.text, end="", flush=True)

Node.js Streaming

const result = await model.generateContentStream("Write a long article about AI");

for await (const chunk of result.stream) {
    process.stdout.write(chunk.text());
}

Benefits

  • Lower perceived latency
  • Real-time user feedback
  • Better UX for long responses
  • Can process tokens as they arrive

See references/streaming.md for advanced patterns.


Cost Optimization

Pricing (Gemini 3 Pro)

| Context Size | Input | Output | |-------------|-------|--------| | ≤ 200k tokens | $2/1M | $12/1M | | > 200k tokens | $4/1M | $18/1M |

Optimization Strategies

  1. Keep prompts under 200k tokens (50% cheaper)
  2. Use thinking_level: "low" for simple tasks (faster, lower cost)
  3. Implement context caching for reusable contexts (see gemini-3-advanced skill)
  4. Monitor token usage and set budgets
  5. Use Gemini 1.5 Flash for simple tasks (20x cheaper)

See references/best-practices.md for comprehensive cost optimization.


Model Selection

Gemini 3 Pro vs Other Models

| Model | Context | Output | Input Price | Best For | |-------|---------|--------|-------------|----------| | gemini-3-pro-preview | 1M | 64k | $2-4/1M | Complex reasoning, coding | | gemini-1.5-pro | 1M | 8k | $7-14/1M | General use, multimodal | | gemini-1.5-flash | 1M | 8k | $0.35-0.70/1M | Simple tasks, cost-sensitive |

When to Use Gemini 3 Pro

✅ Complex reasoning tasks ✅ Advanced coding problems ✅ Long-context analysis (up to 1M tokens) ✅ Large output requirements (up to 64k tokens) ✅ Tasks requiring dynamic thinking

When to Use Alternatives

  • Gemini 1.5 Flash: Simple tasks, cost-sensitive applications
  • Gemini 1.5 Pro: Multimodal tasks, general use
  • Gemini 2.5 models: Experimental features, specific capabilities

Error Handling

Common Errors

| Error | Cause | Solution | |-------|-------|----------| | ResourceExhausted | Rate limit exceeded | Implement retry with backoff | | InvalidArgument | Invalid parameters | Validate input, check docs | | PermissionDenied | Invalid API key | Check authentication | | DeadlineExceeded | Request timeout | Reduce context, retry |

Production Error Handling

from google.api_core import exceptions, retry

@retry.Retry(
    predicate=retry.if_exception_type(
        exceptions.ResourceExhausted,
        exceptions.ServiceUnavailable
    ),
    initial=1.0,
    maximum=60.0,
    multiplier=2.0
)
def safe_generate(prompt):
    try:
        return model.generate_content(prompt)
    except exceptions.InvalidArgument as e:
        logger.error(f"Invalid argument: {e}")
        raise
    except exceptions.PermissionDenied as e:
        logger.error(f"Permission denied - check API key: {e}")
        raise
    except Exception as e:
        logger.error(f"Unexpected error: {e}")
        raise

See references/error-handling.md for comprehensive patterns.


References

Setup & Configuration

Features

Production

Official Resources


Next Steps

After Basic Setup

  1. Explore chat applications - Build conversational interfaces
  2. Add multimodal capabilities - Use gemini-3-multimodal skill
  3. Add image generation - Use gemini-3-image-generation skill
  4. Add advanced features - Use gemini-3-advanced skill (caching, tools, batch)

Common Integration Patterns

  • Simple Chatbot: This skill only
  • Multimodal Assistant: This skill + gemini-3-multimodal
  • Creative Bot: This skill + gemini-3-image-generation
  • Production App: All 4 Gemini 3 skills

Troubleshooting

Issue: API key not working

Solution: Verify API key in Google AI Studio, check environment variable

Issue: Rate limit errors

Solution: Implement rate limiting, upgrade to paid tier, reduce request frequency

Issue: Slow responses

Solution: Use thinking_level: "low" for simple tasks, enable streaming, reduce context size

Issue: High costs

Solution: Keep prompts under 200k tokens, use appropriate thinking level, consider Gemini 1.5 Flash for simple tasks

Issue: Temperature warnings

Solution: Keep temperature at 1.0 (default) - do not modify for complex reasoning tasks


Summary

This skill provides everything needed to integrate Gemini 3 Pro API into your applications:

✅ Quick setup (< 5 minutes) ✅ Production-ready chat applications ✅ Dynamic thinking configuration ✅ Streaming responses ✅ Error handling and retry logic ✅ Cost optimization strategies ✅ Monitoring and logging patterns

For multimodal, image generation, and advanced features, see the companion skills.

Ready to build? Start with Workflow 1: Quick Start Setup above!