Look At - Multimodal File Analysis
Fast, cost-effective file analysis using Google's Gemini 2.5 Flash Lite model for PDFs, images, diagrams, and other media files.
Tool Selection Enforcement
Rationalization Table - STOP When Thinking:
| Excuse | Reality | Do Instead | |--------|---------|------------| | "I can read images directly with Read" | You'll waste thousands of context tokens showing the full image | Use look_at for analysis | | "I'll use Read for this PDF" | You'll lose table structure and visual information by extracting raw text | Use look_at for PDFs with tables/charts/diagrams | | "Just a quick glance at the file" | Your quick glances still consume full context tokens | Use look_at for targeted extraction | | "I need exact text, so Read is required" | Gemini's extraction is accurate for most use cases | Use look_at first, Read only if extraction insufficient | | "look_at adds complexity" | You gain context savings and faster processing | Use look_at for media files | | "The file is small" | Your small files still waste context if uninterpreted | Size doesn't determine tool choice, content type does | | "I'll process it myself" | You waste reasoning tokens on trivial extraction | Delegate to look_at |
Red Flags - STOP Immediately When Thinking:
- If you catch yourself thinking "Let me Read this image/PDF/screenshot" → STOP. Use look_at for media files.
- If you catch yourself thinking "I can see the image directly" → STOP. Seeing it directly still wastes context. Use look_at.
- If you catch yourself thinking "Just need to glance at this diagram" → STOP. Glancing still costs context tokens. Use look_at.
- If you catch yourself thinking "The PDF is text-based, so Read is fine" → STOP. If it has structure/tables/charts, use look_at.
Cost & Context Benefits
| Scenario | Read Tool | look_at Tool | |----------|-----------|--------------| | PDF with table | Extracts raw text (~1000 tokens), loses table structure | Extracts table as structured data (~100 tokens) | | Screenshot | Loads entire image (~500 tokens), requires interpretation | Describes content (~50 tokens) | | Diagram | Shows image (~800 tokens), requires analysis | Explains architecture (~100 tokens) | | Multi-page PDF | All pages loaded (~5000 tokens) | Extracts specific sections (~200 tokens) |
look_at saves 80-95% of context tokens by extracting only relevant information.
When to Use
Use look_at when you need:
- Media files the Read tool cannot interpret
- Extracting specific information or summaries from documents
- Describing visual content in images or diagrams
- Analyzing charts, tables, or structured data in PDFs
- When analyzed/extracted data is needed, not raw file contents
Never use look_at when:
- Source code or plain text files needing exact contents (use Read)
- Files that need editing afterward (need literal content from Read)
- Simple file reading where no interpretation is needed
- Exact formatting or structure must be preserved
How It Works
- Provide a file path and a specific goal (what to extract)
- The helper script uploads the file to Gemini's API
- Gemini 2.5 Flash Lite analyzes the file and extracts requested information
- Only the relevant extracted information is returned (saves context tokens)
Usage Pattern
CRITICAL - Display Requirement:
Always set the Bash tool description parameter to show a clean invocation:
description: "look-at: [goal text]"
Never display the full Python command to the user.
# Basic usage
python3 ${CLAUDE_PLUGIN_ROOT}/skills/look-at/scripts/look_at.py \
--file "/path/to/file.pdf" \
--goal "Extract the title and date from this document"
# With custom model
python3 ${CLAUDE_PLUGIN_ROOT}/skills/look-at/scripts/look_at.py \
--file "/path/to/diagram.png" \
--goal "Describe the architecture shown in this diagram" \
--model "gemini-2.5-flash"
IMPORTANT:
- Always use absolute paths for files
- Always set Bash tool
descriptionto"look-at: [goal]"for clean UX
Response Rules
When using look_at, the response includes:
- Only the extracted information matching the goal
- Clear statement if requested information is not found
- Concise output focused on the goal (no preamble)
Use this extracted information directly in continued work without loading the full file into context.
Supported File Types
| Type | Extensions | MIME Types | |------|-----------|------------| | Images | .jpg, .jpeg, .png, .webp, .heic, .heif | image/* | | Videos | .mp4, .mpeg, .mov, .avi, .webm | video/* | | Audio | .wav, .mp3, .aiff, .aac, .ogg, .flac | audio/* | | Documents | .pdf, .txt, .csv, .md, .html | application/pdf, text/* |
Model Options
| Model | Use Case | Speed | Cost |
|-------|----------|-------|------|
| gemini-2.5-flash-lite | Default - fast, cheap analysis | Fastest | Lowest |
| gemini-3-flash | More complex extraction needs | Fast | Low |
| gemini-3-pro-preview | Highest accuracy required | Medium | Medium |
Default is gemini-2.5-flash-lite for optimal speed/cost ratio.
Common Patterns
REMEMBER: Always use description: "look-at: [goal]" in the Bash tool call.
Extract Specific Information
# Bash tool call with:
# description: "look-at: Extract the executive summary section"
python3 ${CLAUDE_PLUGIN_ROOT}/skills/look-at/scripts/look_at.py \
--file "report.pdf" \
--goal "Extract the executive summary section"
Describe Visual Content
# Bash tool call with:
# description: "look-at: List all UI elements and their layout"
python3 ${CLAUDE_PLUGIN_ROOT}/skills/look-at/scripts/look_at.py \
--file "screenshot.png" \
--goal "List all UI elements and their layout"
Analyze Diagrams
# Bash tool call with:
# description: "look-at: Explain the data flow and component relationships"
python3 ${CLAUDE_PLUGIN_ROOT}/skills/look-at/scripts/look_at.py \
--file "architecture.png" \
--goal "Explain the data flow and component relationships"
Extract Structured Data
# Bash tool call with:
# description: "look-at: Extract the table data as JSON"
python3 ${CLAUDE_PLUGIN_ROOT}/skills/look-at/scripts/look_at.py \
--file "table.pdf" \
--goal "Extract the table data as JSON with columns: name, value, date"
Environment Setup
Required environment variable:
export GOOGLE_API_KEY="your-api-key-here"
Required Python package:
pip install google-genai
For pixi-managed projects, add to pixi.toml:
[dependencies]
google-genai = ">=1.0.0"
Cost Optimization
- Gemini 2.5 Flash Lite is the most cost-effective option
- Only extracts requested information (saves on output tokens)
- Avoids loading full files into main conversation context
- Use specific goals to minimize unnecessary processing
Troubleshooting
| Issue | Solution |
|-------|----------|
| API key not set | Set GOOGLE_API_KEY environment variable |
| File not found | Use absolute paths, verify file exists |
| Large file timeout | Break into smaller files or use lower-quality images |
| Rate limit errors | Add retry logic or use batch processing |
| Empty response | Check that goal is clear and specific |
Examples
See examples/ directory for:
analyze_pdf.sh- PDF document extractiondescribe_image.sh- Image analysisextract_table.sh- Structured data extraction
Related Skills
/gemini-batch- For batch processing of many files- Standard
Readtool - For text files needing exact contents