Gemini Vision Skill
Generate AI images and videos by invoking Gemini CLI's vision extension. This skill provides access to:
- Nano Banana (gemini-2.5-flash-image) - Image generation and transformation
- Veo 3 (veo-3.0-generate-001) - Video generation from images
- Webcam capture - Live frame capture for AI processing
Prerequisites
- Gemini CLI: Must be installed and configured
- Vision Extension: Install via:
gemini extensions install vision - API Key: Set
GEMINI_API_KEYenvironment variable
When to Use This Skill
Use this skill when the user asks to:
- Generate images from text prompts
- Transform or reimagine existing images
- Create AI-generated videos from images
- Capture webcam frames for AI processing
- Create "nano banana" style images
- Generate Veo videos
Available Operations
1. Image Generation (Nano Banana)
Generate images from text prompts or transform existing images.
Command Pattern:
gemini -p "/vision:banana prompt=\"Your creative prompt here\" n=1 out_dir=./output"
Parameters:
| Parameter | Default | Description |
|-----------|---------|-------------|
| prompt | Required | Creative description of desired image |
| n | 1 | Number of images to generate |
| out_dir | "." | Output directory for images |
| model | gemini-2.5-flash-image | Image generation model |
Models Available:
gemini-2.5-flash-image(default, recommended)gemini-3-pro-image-preview(newer, experimental)
2. Video Generation (Veo 3)
Generate short videos from images or prompts.
Command Pattern:
gemini -p "/vision:veo prompt=\"Animate this scene\" aspect_ratio=16:9 out_dir=./output"
Parameters:
| Parameter | Default | Description |
|-----------|---------|-------------|
| prompt | Required | Animation/motion description |
| aspect_ratio | "16:9" | Video aspect ratio (16:9 or 9:16) |
| resolution | auto | Video resolution (e.g., "1080p") |
| negative_prompt | "" | What to avoid in video |
| veo_model | veo-3.0-generate-001 | Video model |
3. Webcam Capture + AI
Capture from webcam and process with AI.
# Start camera
gemini -p "/vision:start"
# Capture and transform
gemini -p "/vision:banana prompt=\"Transform into oil painting\""
# Stop camera
gemini -p "/vision:stop"
Instructions for Claude
When the user requests image or video generation:
-
Determine the operation type:
- Text-to-image → Use
/vision:banana - Image transformation → Use
/vision:bananawith input image - Image-to-video → Use
/vision:veo - Webcam capture → Use
/vision:captureor/vision:banana
- Text-to-image → Use
-
Construct the Gemini CLI command:
gemini -p "/vision:<command> prompt=\"<user prompt>\" <params>" -
Execute via Bash tool:
- Run the command
- Capture the output paths
- Report success and file locations to user
-
Handle output:
- Images saved as
banana_*.pngorbanana_*.jpg - Videos saved as
veo_*.mp4 - Return the file paths to the user
- Images saved as
Example Workflows
Generate a Single Image
User: "Generate an image of a cyberpunk city at sunset"
Action:
gemini -p "/vision:banana prompt=\"A sprawling cyberpunk city at sunset, neon lights reflecting off wet streets, flying cars in the distance, highly detailed, cinematic\" n=1 out_dir=."
Transform an Image
User: "Make this photo look like a Studio Ghibli scene" (with image attached)
Action:
- Save the attached image to a temp location
- Run:
gemini -p "/vision:banana prompt=\"Transform into Studio Ghibli animation style, soft colors, whimsical atmosphere\" input_paths=['/path/to/image.jpg']"
Generate a Video
User: "Create a video of ocean waves"
Action:
gemini -p "/vision:veo prompt=\"Calm ocean waves gently rolling onto a sandy beach, golden hour lighting, peaceful atmosphere\" aspect_ratio=16:9"
Webcam to Art
User: "Take a photo of me and make it look like a Renaissance painting"
Action:
# Capture and transform in one step
gemini -p "/vision:banana prompt=\"Transform into a Renaissance oil painting, dramatic lighting, classical composition\""
Output Format
Always report results in this format:
## Generated Content
**Type:** Image/Video
**Files:**
- `/path/to/banana_20251227_123456_000.png`
**Prompt Used:** [the prompt]
**Model:** gemini-2.5-flash-image
To view: Open the file path above or use `open /path/to/file`
Error Handling
Common issues and solutions:
| Error | Solution |
|-------|----------|
| "Camera not found" | Run /vision:devices to list cameras |
| "GEMINI_API_KEY not set" | Export the API key in environment |
| "Model not available" | Check model ID spelling |
| "Generation failed" | Try simpler prompt or different model |
Script Usage (Alternative)
For programmatic access, use the helper script:
python ~/.claude/skills/gemini-vision/scripts/gemini_vision.py \
--operation banana \
--prompt "Your prompt here" \
--output-dir ./output \
--count 1
Options:
--operation: banana, veo, capture, devices--prompt: The generation prompt--output-dir: Where to save files--count: Number of images (for banana)--aspect-ratio: For veo (16:9 or 9:16)--model: Override default model