Gemini API Skill | Agent Skills

Gemini API

Use Google Gemini API via REST for text generation, multimodal analysis, image generation, and more.

Prerequisites

Environment variable GOOGLE_API_KEY must be set
API endpoint: https://generativelanguage.googleapis.com/v1beta

Available Models

| Model | Use Case | |-------|----------| | gemini-2.5-flash | Fast text generation (default) | | gemini-2.5-pro | High quality text generation | | gemini-3-flash-preview | Latest flash model | | gemini-3-pro-preview | Latest pro model | | gemini-2.5-flash-image | Image generation (Nano Banana) | | gemini-3-pro-image-preview | Advanced image generation with thinking & search |

Workflow

Phase 1: Determine Task Type

Based on user request, identify which capability to use:

Text Generation: Basic prompts, chat, Q&A
Multimodal Analysis: Analyze images, videos, or audio
Image Generation: Create or edit images (Nano Banana)
Function Calling: Execute custom functions
Search Grounding: Real-time web search integration

Phase 2: Execute API Call

Use the appropriate curl command based on task type.

1. Text Generation

Basic Prompt

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$GOOGLE_API_KEY" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d '{
      "contents": [{
        "parts": [{"text": "Your prompt here"}]
      }]
    }'

With Configuration

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$GOOGLE_API_KEY" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d '{
      "contents": [{
        "parts": [{"text": "Your prompt here"}]
      }],
      "generationConfig": {
        "temperature": 0.9,
        "maxOutputTokens": 2000,
        "stopSequences": ["END"]
      }
    }'

Multi-turn Chat

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$GOOGLE_API_KEY" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d '{
      "contents": [
        {"role": "user", "parts": [{"text": "First message"}]},
        {"role": "model", "parts": [{"text": "Model response"}]},
        {"role": "user", "parts": [{"text": "Follow-up question"}]}
      ]
    }'

System Instructions

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$GOOGLE_API_KEY" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d '{
      "system_instruction": {
        "parts": [{"text": "You are a helpful assistant that speaks like a pirate."}]
      },
      "contents": [{
        "parts": [{"text": "Hello!"}]
      }]
    }'

JSON Mode Output

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$GOOGLE_API_KEY" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d '{
      "contents": [{
        "parts": [{"text": "List 3 colors as JSON array"}]
      }],
      "generationConfig": {
        "response_mime_type": "application/json"
      }
    }'

Streaming Response

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:streamGenerateContent?alt=sse&key=$GOOGLE_API_KEY" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d '{
      "contents": [{
        "parts": [{"text": "Write a long story"}]
      }]
    }'

Safety Settings

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$GOOGLE_API_KEY" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d '{
      "contents": [{
        "parts": [{"text": "Your prompt"}]
      }],
      "safetySettings": [
        {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_ONLY_HIGH"},
        {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_ONLY_HIGH"}
      ]
    }'

2. Multimodal Analysis

Image Analysis (Base64 Inline)

# First encode image to base64
BASE64_IMAGE=$(base64 -w0 image.jpg)

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$GOOGLE_API_KEY" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d '{
      "contents": [{
        "parts": [
          {"text": "Describe this image in detail"},
          {"inline_data": {"mime_type": "image/jpeg", "data": "'$BASE64_IMAGE'"}}
        ]
      }]
    }'

Video Analysis (File API)

Step 1: Upload Video

# Get upload URL
UPLOAD_URL=$(curl -s "https://generativelanguage.googleapis.com/upload/v1beta/files?key=$GOOGLE_API_KEY" \
    -H "X-Goog-Upload-Protocol: resumable" \
    -H "X-Goog-Upload-Command: start" \
    -H "X-Goog-Upload-Header-Content-Length: $(stat -f%z video.mp4)" \
    -H "X-Goog-Upload-Header-Content-Type: video/mp4" \
    -H "Content-Type: application/json" \
    -d '{"file": {"display_name": "video.mp4"}}' \
    -D - | grep -i "x-goog-upload-url" | cut -d' ' -f2 | tr -d '\r')

# Upload file
curl "$UPLOAD_URL" \
    -H "X-Goog-Upload-Offset: 0" \
    -H "X-Goog-Upload-Command: upload, finalize" \
    -H "Content-Type: video/mp4" \
    --data-binary @video.mp4

Step 2: Query with Video

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$GOOGLE_API_KEY" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d '{
      "contents": [{
        "parts": [
          {"text": "Describe what happens in this video"},
          {"file_data": {"mime_type": "video/mp4", "file_uri": "FILE_URI_FROM_UPLOAD"}}
        ]
      }]
    }'

Audio Analysis

Similar to video, upload via File API then query:

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$GOOGLE_API_KEY" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d '{
      "contents": [{
        "parts": [
          {"text": "Transcribe and summarize this audio"},
          {"file_data": {"mime_type": "audio/mp3", "file_uri": "FILE_URI_FROM_UPLOAD"}}
        ]
      }]
    }'

3. Image Generation (Nano Banana)

Basic Image Generation

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent?key=$GOOGLE_API_KEY" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d '{
      "contents": [{"parts": [{"text": "Create a photorealistic image of a cat wearing a hat"}]}],
      "generationConfig": {
        "responseModalities": ["TEXT", "IMAGE"]
      }
    }'

With Aspect Ratio Control

Supported ratios: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent?key=$GOOGLE_API_KEY" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d '{
      "contents": [{"parts": [{"text": "Create a landscape scene"}]}],
      "generationConfig": {
        "responseModalities": ["IMAGE"],
        "imageConfig": {
          "aspectRatio": "16:9"
        }
      }
    }'

Image Editing (Character Consistency)

BASE64_IMAGE=$(base64 -w0 original.jpg)

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent?key=$GOOGLE_API_KEY" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d '{
      "contents": [{
        "parts": [
          {"text": "Put this character in a tropical forest"},
          {"inline_data": {"mime_type": "image/jpeg", "data": "'$BASE64_IMAGE'"}}
        ]
      }],
      "generationConfig": {
        "responseModalities": ["TEXT", "IMAGE"]
      }
    }'

High Resolution (Pro Model - 2K/4K)

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent?key=$GOOGLE_API_KEY" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d '{
      "contents": [{"parts": [{"text": "A photo of an oak tree in all four seasons"}]}],
      "generationConfig": {
        "responseModalities": ["TEXT", "IMAGE"],
        "imageConfig": {
          "aspectRatio": "1:1",
          "imageSize": "4K"
        }
      }
    }'

Image Generation with Search Grounding (Pro)

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent?key=$GOOGLE_API_KEY" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d '{
      "contents": [{"parts": [{"text": "Visualize the current weather forecast for Tokyo as a chart"}]}],
      "generationConfig": {
        "responseModalities": ["TEXT", "IMAGE"],
        "imageConfig": {
          "aspectRatio": "16:9"
        }
      },
      "tools": [{"google_search": {}}]
    }'

Multi-Image Fusion

BASE64_IMG1=$(base64 -w0 image1.jpg)
BASE64_IMG2=$(base64 -w0 image2.jpg)

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent?key=$GOOGLE_API_KEY" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d '{
      "contents": [{
        "parts": [
          {"text": "Combine these two characters in a fantasy world"},
          {"inline_data": {"mime_type": "image/jpeg", "data": "'$BASE64_IMG1'"}},
          {"inline_data": {"mime_type": "image/jpeg", "data": "'$BASE64_IMG2'"}}
        ]
      }],
      "generationConfig": {
        "responseModalities": ["TEXT", "IMAGE"]
      }
    }'

4. Function Calling

Define and Call Functions

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$GOOGLE_API_KEY" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d '{
      "contents": [{
        "role": "user",
        "parts": [{"text": "What movies are playing in Mountain View?"}]
      }],
      "tools": [{
        "function_declarations": [{
          "name": "find_movies",
          "description": "Find movies playing in theaters",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {"type": "string", "description": "City and state"},
              "genre": {"type": "string", "description": "Movie genre"}
            },
            "required": ["location"]
          }
        }]
      }]
    }'

Provide Function Response

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$GOOGLE_API_KEY" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d '{
      "contents": [
        {"role": "user", "parts": [{"text": "What movies are playing in Mountain View?"}]},
        {"role": "model", "parts": [{"functionCall": {"name": "find_movies", "args": {"location": "Mountain View, CA"}}}]},
        {"role": "function", "parts": [{"functionResponse": {"name": "find_movies", "response": {"movies": ["Barbie", "Oppenheimer"]}}}]}
      ],
      "tools": [{
        "function_declarations": [{
          "name": "find_movies",
          "description": "Find movies playing in theaters",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {"type": "string"},
              "genre": {"type": "string"}
            },
            "required": ["location"]
          }
        }]
      }]
    }'

5. Search Grounding

Real-time web search integration:

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$GOOGLE_API_KEY" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d '{
      "contents": [{"parts": [{"text": "What is the current Google stock price?"}]}],
      "tools": [{"google_search": {}}]
    }'

Response includes groundingMetadata with sources.

6. Context Caching

For repeated queries on the same large content:

Create Cache

curl "https://generativelanguage.googleapis.com/v1beta/cachedContents?key=$GOOGLE_API_KEY" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d '{
      "model": "models/gemini-2.5-flash",
      "contents": [{"parts": [{"text": "LARGE_DOCUMENT_TEXT_HERE"}]}],
      "ttl": "3600s"
    }'

Use Cache

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$GOOGLE_API_KEY" \
    -H 'Content-Type: application/json' \
    -X POST \
    -d '{
      "cachedContent": "cachedContents/CACHE_ID",
      "contents": [{"parts": [{"text": "Summarize the document"}]}]
    }'

7. Model Information

List All Models

curl "https://generativelanguage.googleapis.com/v1beta/models?key=$GOOGLE_API_KEY"

Get Specific Model

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash?key=$GOOGLE_API_KEY"

Response Handling

Text Response Structure

{
  "candidates": [{
    "content": {
      "parts": [{"text": "Response text here"}],
      "role": "model"
    },
    "finishReason": "STOP"
  }],
  "usageMetadata": {
    "promptTokenCount": 10,
    "candidatesTokenCount": 50,
    "totalTokenCount": 60
  }
}

Image Response Structure

When using image generation, response includes base64-encoded images:

{
  "candidates": [{
    "content": {
      "parts": [
        {"text": "Here is your image:"},
        {"inlineData": {"mimeType": "image/png", "data": "BASE64_IMAGE_DATA"}}
      ]
    }
  }]
}

To save the image:

# Extract and decode image from response
echo "BASE64_DATA" | base64 -d > output.png

Error Handling

| Error | Cause | Solution | |-------|-------|----------| | 400 | Invalid request | Check JSON syntax | | 401 | Invalid API key | Verify GOOGLE_API_KEY | | 429 | Rate limit | Wait and retry | | 500 | Server error | Retry with exponential backoff |

Best Practices

Use appropriate model: Flash for speed, Pro for quality
Set temperature: Lower (0.1-0.3) for factual, higher (0.7-1.0) for creative
Limit output tokens: Set maxOutputTokens to avoid excessive responses
Use caching: For repeated queries on large documents
Handle streaming: For long responses, use streamGenerateContent
Image generation tips: Use detailed, descriptive prompts for best results

Agent Skills: Gemini API

Install this agent skill to your local

Skill Files