Gemini Batch API Skill
Large-scale asynchronous document processing using Google's Gemini models.
When to Use
- Process thousands of documents with the same prompt
- Cost-effective bulk extraction (50% cheaper than synchronous API)
- Jobs that can tolerate 24-hour completion windows
IRON LAW: Use Examples First, Never Guess API
READ EXAMPLES BEFORE WRITING ANY CODE. NO EXCEPTIONS.
The Rule
User asks for batch API work
↓
MANDATORY: Read examples/batch_processor.py or examples/icon_batch_vision.py
↓
Copy the pattern exactly
↓
DO NOT guess parameter names
DO NOT try wrapper types
DO NOT improvise API calls
Why This Matters
The Batch API has non-obvious requirements that will fail silently:
- Metadata must be flat primitives - Nested objects cause cryptic errors
destis a config field, not a kwarg - Pass viaconfig={"dest": "gs://..."}. Older SDKs accepteddest=directly; newer ones raise TypeError.- Config is plain dict - Not a wrapper type
- Examples are authoritative - Working code beats assumptions
Rationale: Previous agents wasted hours debugging API errors that the examples would have prevented. The patterns in examples/ are battle-tested production code.
Rationalization Table - STOP If You Catch Yourself Thinking:
| Excuse | Reality | Do Instead | |--------|---------|------------| | "I know how APIs work" | You're overconfident about non-obvious gotchas | Read examples first | | "I can figure it out" | You'll waste 30+ minutes on trial-and-error | Copy working patterns | | "The examples might be outdated" | They're maintained and tested | Trust the examples | | "I need to customize anyway" | Your customization comes AFTER copying base pattern | Start with examples, then adapt | | "Reading examples takes too long" | You'll save 30 minutes debugging with 2 minutes of reading | Read examples first | | "My approach is simpler" | Your simpler approach already failed | Use proven patterns |
Red Flags - STOP If You Catch Yourself Thinking:
- "Let me pass
dest=as a kwarg" → Works on older SDKs only. Current SDK putsdestinsideconfig={}. Read examples. - "I'll create a
CreateBatchJobConfigobject" → You're instantiating a type instead of using a plain dict. Stop. - "I'll nest metadata like a normal API" → You'll trigger BigQuery type errors. Flatten your data.
- "This should work like other Google APIs" → Your assumption is wrong; this API is different.
- "I'll figure out the JSONL format" → You'll waste time. Copy from examples instead.
MANDATORY Checklist Before ANY Batch API Code
- [ ] Read
examples/batch_processor.pyORexamples/icon_batch_vision.py - [ ] Identify which example matches the use case (Standard API vs Vertex AI)
- [ ] Copy the example's API call pattern exactly
- [ ] Copy the example's JSONL structure exactly
- [ ] Copy the example's metadata structure exactly
- [ ] Adapt for specific needs only after copying base pattern
Enforcement: Writing batch API code without reading examples first violates this IRON LAW and will result in preventable errors.
Prerequisites
Install gcloud SDK
# macOS: Install via nix-darwin (add to ~/nix/ configuration)
# Or if already available: gcloud --version
# Linux: Install Google Cloud SDK from official sources
curl https://sdk.cloud.google.com | bash
Authentication Setup
# Authenticate with Google Cloud Platform
gcloud auth login
# Set up Application Default Credentials for Python libraries
gcloud auth application-default login
# Enable Vertex AI API in your project
gcloud services enable aiplatform.googleapis.com
Why both auth methods?
gcloud auth login: For gsutil and gcloud CLI commandsgcloud auth application-default login: For google-generativeai Python library- CRITICAL: Vertex AI requires ADC (step 2), not just API key
Create GCS Bucket
# Create bucket in us-central1 (required region)
gsutil mb -l us-central1 gs://your-batch-bucket
# Verify bucket location is us-central1
gsutil ls -L -b gs://your-batch-bucket | grep "Location"
See references/gcs-setup.md for complete setup guide.
Quick Start
Standard Gemini API (API Key)
Uses the Gemini File API for input. Results returned via batch_job.dest.file_name.
from google import genai
client = genai.Client() # Uses GOOGLE_API_KEY env var
# Upload JSONL to File API
uploaded = client.files.upload(
file="requests.jsonl",
config={"mime_type": "application/jsonl"}
)
# Submit batch job
job = client.batches.create(
model="gemini-2.5-flash-lite",
src=uploaded.name, # "files/..." URI
config={"display_name": "my-batch-job"}
)
# Results available at job.dest.file_name after completion
Vertex AI (Recommended for GCS workflows)
Uses GCS URIs directly. dest is a field of the config dict in the
current SDK (older SDKs accepted dest= as a kwarg — that now raises
TypeError: Batches.create() got an unexpected keyword argument 'dest').
from google import genai
# Use Vertex AI with ADC (not API key)
client = genai.Client(
vertexai=True,
project="your-project-id",
location="us-central1"
)
# Submit batch job with GCS paths.
# Current SDK signature: create(*, model, src, config)
job = client.batches.create(
model="gemini-2.5-flash-lite",
src="gs://bucket/requests.jsonl", # GCS input
config={
"display_name": "my-job",
"dest": "gs://bucket/outputs/", # GCS output (Vertex AI only!)
},
)
Verify your SDK before changing: inspect.signature(client.batches.create).
If dest is in the kwargs, the kwarg form works; otherwise use config.
Key difference: Standard API uses File API (files/...), Vertex AI uses GCS (gs://...) with dest (now a config field).
Core Workflow
Standard API:
- Create JSONL request file with prompts
- Upload JSONL to File API via
client.files.upload() - Submit batch job via
client.batches.create(src=uploaded.name) - Monitor for completion — use Monitor tool (jobs expire after 24 hours)
- Download results from
job.dest.file_name
Vertex AI:
- Upload files to GCS bucket (us-central1 region required)
- Create JSONL request file with document URIs and prompts
- Submit batch job via
client.batches.create(src=..., config={"dest": ...}) - Monitor for completion — use Monitor tool (jobs expire after 24 hours)
- Download and parse results from GCS output URI
- Handle failures gracefully (partial failures are common)
Monitoring Batch Jobs with Monitor Tool
After submitting a batch job, use Monitor instead of sleep-polling in Python:
Monitor(
description="Gemini batch job progress",
persistent=true,
timeout_ms=3600000,
command="while true; do uv run python3 -c \"import google.genai as genai; j=genai.batches.get(name='$JOB_NAME'); print(f'{j.state} | {j.name}'); exit(0 if j.state in ('JOB_STATE_SUCCEEDED','JOB_STATE_FAILED','JOB_STATE_CANCELLED') else 1)\" && break; sleep 60; done"
)
This frees the conversation to continue working while the batch runs. You get notified when the job completes or fails — no polling loop blocking your context.
Key Gotchas (API Structure)
Metadata must be flat primitives (no nested objects — BigQuery-backed storage). dest is a config field, not a top-level kwarg in the current SDK (Vertex AI only). Config is a plain dict (not a wrapper type).
See the Rationalization Table in the first Iron Law section above — the same gotchas apply here. The Key Gotchas table below summarizes all critical issues.
Key Gotchas
| Issue | Solution |
|-------|----------|
| Nested metadata fails | Use flat primitives or json.dumps() for complex data |
| TypeError: unexpected keyword dest | Move dest inside config={} (Vertex AI; current SDK) |
| Mixing API patterns | Standard API: File API + no dest. Vertex AI: GCS + dest |
| Auth errors with Vertex AI | Run gcloud auth application-default login |
| vertexai=True requires ADC | API key is ignored with vertexai=True |
| Missing aiplatform API | Run gcloud services enable aiplatform.googleapis.com |
| Region mismatch (Vertex) | Use us-central1 bucket only |
| Wrong URI format (Vertex) | Use gs:// not https:// |
| Invalid JSONL | Use scripts/validate_jsonl.py |
| Image batch: inline data | Use fileData.fileUri for batch, not inline |
| Duplicate IDs | Hash file content + prompt for unique IDs |
| Large PDFs fail | Split at 50 pages / 50MB max |
| JSON parsing fails | Use robust extraction (see gotchas.md) |
| Output not found (Vertex) | Output URI is prefix, not file path |
| uploadToFileSearchStore 503 for files >10KB | Use two-step: files.upload() then fileSearchStores.importFile() |
| File stuck in PROCESSING state | Poll files.get() until state is ACTIVE before importing |
| SDK Pager stops after first page | Use pager.hasNextPage() + pager.nextPage(), NOT for await |
| Batch inlinedResponse.response.text is undefined | Response is raw JSON, not hydrated class. Use candidates[0].content.parts[0].text |
| Store document displayName is random ID after importFile | Read bibkey from customMetadata, not displayName |
| responseMimeType + tools in batch = error code 3 | Omit responseMimeType when using tools; use prompt-based JSON instructions |
Top 3 mistakes (bolded above):
- Using nested objects in metadata instead of flat primitives
- Mixing Standard API and Vertex AI patterns
- Passing
dest=as a kwarg instead of insideconfig={}(Vertex AI; current SDK)
See references/gotchas.md for detailed solutions (now with Gotchas 10-16).
Rate Limits
| Limit | Value | |-------|-------| | Max requests per JSONL | 10,000 | | Max concurrent jobs | 10 | | Max job size | 100MB | | Job expiration | 24 hours |
Recommended Models
| Model | Use Case | Cost | Location | Thinking default |
|-------|----------|------|----------|------------------|
| gemini-2.5-flash-lite | Most batch jobs | Lowest | us-central1 | OFF |
| gemini-2.5-flash | Complex extraction | Medium | us-central1 | OFF |
| gemini-2.5-pro | Highest accuracy | Highest | us-central1 | ON (cannot disable) |
| gemini-3-flash-preview | New gen, larger context | 5× flash-lite | global | HIGH (set MINIMAL!) |
| gemini-3.1-flash-lite-preview | Cheapest gen-3 | ~2× 2.5 flash-lite | global | HIGH (set MINIMAL!) |
| gemini-embedding-001 | Default for text-only (short titles, classification, retrieval over text) | Low | Standard API | n/a |
| gemini-embedding-2 | Multimodal (text+image) inputs | Low | Standard API | n/a |
| text-embedding-005 | Need Vertex Batch console visibility (legacy) | Low | us-central1 | n/a |
Critical for Gemini 3.x: Always set thinkingConfig: {thinkingLevel: "MINIMAL"} in generationConfig or batch responses will silently fail with MAX_TOKENS and empty content. See references/gotchas.md Gotcha 12.
Critical for embedding batches: Embedding work has its own rules and failure modes — use file-based JSONL with per-row key on the Standard API; never inlined_requests (scrambles order at scale). Default to gemini-embedding-001 for text-only tasks. See references/embeddings.md and examples/embeddings_batch.py.
Additional Resources
References
references/embeddings.md- NEW: Dedicated reference for embedding batches (model choice, file-based + keyed pattern, sentinel verification)references/gcs-setup.md- Complete GCS and Vertex AI setup guidereferences/gotchas.md- 14 critical production gotchas (Gemini 3.x thinking, location='global'; embedding gotcha now lives in embeddings.md)references/best-practices.md- Idempotent IDs, state tracking, validationreferences/scale-up-testing.md- Incremental scale-up testing (LangExtract prototyping, LLM-as-judge, Vertex AI batch)references/troubleshooting.md- Common errors and debuggingreferences/vertex-ai.md- Enterprise alternative with comparisonreferences/cli-reference.md- gsutil and gcloud commands
Examples
examples/icon_batch_vision.py- NEW: Batch vision analysis with Vertex AIexamples/batch_processor.py- Complete GeminiBatchProcessor classexamples/embeddings_batch.py- NEW:gemini-embedding-2viaclient.batches.create_embeddings()(the only supported production path; Vertex Batch rejects this model)examples/pipeline_template.py- Customizable pipeline template
Scripts
scripts/validate_jsonl.py- Validate JSONL before submissionscripts/test_single.py- Test single request before batch
External Documentation
Date Awareness
Gemini API evolves rapidly. For API features or model names with uncertainty, verify against current documentation.