Generate Illustrations with Gemini
Analyze a blog post or topic, propose illustrations, and generate them via the Gemini image generation API.
Arguments
Parse the user's input for:
- Target: A file path (e.g.,
_d/four-healths.md) or a freeform topic (e.g., "meditation benefits") --style 'description': Override the default illustration style entirely--ref 'path': One or more reference images for character consistency (can be repeated). When using the default raccoon style, always pass the canonical reference image (see below) unless the user opts out--api-url 'url': Override the Gemini API endpoint (default below)--count N: Max number of images to generate (default: 3)--aspect 'W:H': Aspect ratio viaimageConfig(default: 3:4, portrait). Valid values:1:1,2:3,3:2,3:4,4:3,4:5,5:4,9:16,16:9,21:9--transparent: Generate on magenta chroma-key background (#FF00FF), then strip it via ImageMagick border-seeded flood fill. The scanner samples pixels along every image edge and keeps only the ones that are actually near#FF00FFas flood seeds — so shots where Gemini rendered grass/scenery into some corners (which broke the earlier 4-corners-only approach when flood-fill started from grass at 30% fuzz and ate the subject) still strip cleanly. Only magenta pixels reachable from the image edges are made transparent; interior magenta-tinted pixels (pink fur highlights, glass reflections, lobster-claw reds) are preserved automatically. Fast (sub-second) and pixel-accurate — no ML needed. Requiresmagick(ImageMagick). If the subject fills the frame and no border pixel is near-magenta, the strip is skipped with an error rather than producing a swiss-cheese result. After the strip, two layered evals auto-run — see Automatic eval below.--no-eval: Skip the alpha-mask eval pass that looks for interior holes, residual magenta, and edge fringe (needs numpy/pillow/scipy — theuv run --scriptshebang installs them automatically, but plainpython3invocations withoutuvmay need this flag). The alpha-mean signal still runs.--eval-strict: Exit nonzero when any alpha-mask eval threshold trips. Useful when a calling agent wants to retry or fail loudly instead of silently shipping a broken alpha mask.
Automatic eval
When --transparent is active, generate.py runs two complementary evals on the output and prints metrics to stderr.
(1) Alpha-mean signal (evaluate_strip, always on — same thresholds test_generate.py asserts):
- Status
healthy— alpha mean in 15–85% band. - Status
subject_eaten— alpha mean below 15%; the strip ate the subject. Regenerate with a magenta border on all four sides. - Status
nothing_stripped— alpha mean above 85%; subject fills the frame. Widen the crop.
(2) Alpha-mask quality signal (eval_alpha, opt-out via --no-eval):
interior_hole_px— pixels in transparent regions that only become enclosed once the opaque mask is morphologically closed by 1 pixel. Isolates flood-fill bleed-through damage: thin 1–2-pixel channels the chroma-strip drills through the character (neck, between fingers, limb outlines) that topologically connect a real interior hole to the outside background so a naive "enclosed transparent" check reports zero. Legitimate design gaps (armpit openings, space between legs) are wider than 2 px and stay unaffected by the closing, so they don't false-alarm.interior_hole_largest_px— pixels in the single biggest channel-revealed hole. More stable across images; good thresholding target because one big visible hole is what a human notices.residual_magenta_px— opaque pixels still near chroma color (signals trapped magenta pockets corner-seeded flood couldn't reach).edge_fringe_px— partial-alpha pixels (signals halo).
Output format:
eval [healthy] out.webp: alpha=51.3% size=74.0KB
[eval] /tmp/out.webp: holes=0 (largest=0), residual=0, fringe=0 [OK]
[eval] /tmp/out.webp: holes=4508 (largest=4356), residual=0, fringe=0 [WARN: interior damage likely — check alpha mask]
Thresholds for the mask-quality signal are conservative by default (holes > 500, residual > 500, fringe > 2000). Pass --no-eval to skip it. Pass --eval-strict to exit nonzero when a mask-quality threshold trips.
Why: visual inspection on a light or dark background hides interior damage (holes read as shadow/shading). The alpha mask is the ground truth. Baking both evals into the skill makes them the default, so a silently-broken output can't ship. See /hill-climbing for the "eval becomes regression guard" pattern.
Configuration
- Auth:
GOOGLE_API_KEY— auto-loaded from~/.envbygenerate.py - Default style: Read from
raccoon-style.txt(in this skill's directory) bygenerate.py - Reference image: Auto-resolved by
generate.py(searches~/gits/blog*/images/raccoon-nerd.webp) - Low-level script:
gemini-image.shhandles single API calls (used internally bygenerate.py) - Generation wrapper:
../image-explore/generate.pyhandles env loading, style, ref image, and parallel batch execution
When --style is provided, it replaces the default raccoon style entirely (it is not appended).
Workflow
Phase 1: Read & Analyze Content
If the target is a file path:
- Read the file
- Identify the main themes/sections
- Note any existing images (look for
blob_image,local_image,image_floatincludes, and raw markdown images) - Identify sections that would benefit from illustrations — prioritize sections that have no images yet
If the target is a freeform topic:
- Use it directly as the theme for image generation
- Skip the content analysis and go straight to Phase 2
Phase 2: Design Illustrations
For each illustration opportunity, prepare:
- Section: Which part of the post it enhances (or "standalone" for topic-based)
- Filename: Following the project convention —
raccoon-{descriptor}.webpfor raccoon style,{descriptor}.webpotherwise - Prompt: A detailed generation prompt that combines the style + the specific scene/action
- For raccoon style, include a
Shirt text: 'SOMETHING'directive relevant to the section - Include the aspect ratio in the prompt (e.g., "portrait orientation, 3:4 aspect ratio")
- For raccoon style, include a
Present at most --count illustrations (default 3).
Phase 3: Confirm with User
Present the illustration plan as a table:
| # | Section | Filename | Prompt Summary | | --- | ------- | ----------------------- | ----------------------------------------------- | | 1 | Health | raccoon-kettlebell.webp | Raccoon lifting kettlebell, shirt: "FIT FELLOW" | | 2 | Family | raccoon-picnic.webp | Raccoon at family picnic, shirt: "FAMILY TIME" |
Ask the user to approve, modify, or remove items before generating. Use AskUserQuestion to confirm.
Phase 4: Generate Images
Use generate.py from the image-explore skill. It handles env loading (~/.env), raccoon style
(from raccoon-style.txt), reference image resolution, and parallel batch execution automatically.
-
Resolve the script path:
CHOP_ROOT="$(cd "$(dirname "$(readlink -f ~/.claude/skills/gen-image/SKILL.md)")" && git rev-parse --show-toplevel)" GEN="$CHOP_ROOT/skills/image-explore/generate.py" -
Single image:
uv run "$GEN" single --scene "Raccoon lifting kettlebell in a gym" --shirt "FIT" --output raccoon-kettlebell.webpThe script's PEP 723 shebang auto-installs deps (typer + numpy/pillow/scipy for the
--transparenteval). Pass--aspect,--ref, or--styleto override defaults. Under--transparent, pass--no-evalto skip the mask-quality eval on stock python3 callers without numpy/scipy, and--eval-strictto exit nonzero when any eval threshold trips. -
Multiple images (parallel): Write a JSON file and use batch mode:
[ { "scene": "Raccoon lifting kettlebell in a gym", "shirt": "FIT", "output": "raccoon-kettlebell.webp" }, { "scene": "Raccoon at family picnic", "shirt": "FAMILY", "output": "raccoon-picnic.webp" } ]uv run "$GEN" batch illustrations.json --aspect 3:4 -
After generation, show each image to the user by reading the file with the Read tool (which renders images inline).
-
If generation fails, report the error and ask if the user wants to retry with a modified prompt or skip.
Auto-eval runs on every generation. When --transparent is set, generate.py runs two complementary evals right after the chroma-key pass — the alpha-mean signal (always) and the alpha-mask quality signal (interior holes, residual magenta, edge fringe; opt-out via --no-eval). Details and thresholds in the Automatic eval subsection above. See /hill-climbing for the "eval becomes regression guard" pattern.
Verifying transparent output. Don't judge chroma-key quality by compositing on a solid background — interior holes read as the background color. Extract the alpha channel as a mask: magick out.webp -alpha extract mask.png. A clean mask is a solid silhouette; swiss-cheese holes mean the chroma ate interior color data.
Never chain chroma passes on different magenta tones (e.g. a second pass on #E040E0 to catch pink shadow remnants). It eats magenta-tinted highlights inside fluffy characters. If the first pass has fringe, regenerate the source with a stricter prompt (no shadow on ground, no gradient, no environment) rather than filtering harder.
Phase 5: Insert References (Optional)
Ask the user if they want the images inserted into the post. If yes:
-
For images stored in the blog's
assets/images/directory, use:{% include local_image_float_right.html src="filename.webp" %} -
For images stored in the external blob repo (
idvorkin/blob), use:{% include blob_image_float_right.html src="blog/filename.webp" %} -
Insert the include tag just below the relevant section header (after any front matter or introductory text)
If the target was a freeform topic (not a file), skip this phase — just tell the user where the files were saved.
Output Directory
- If editing a blog post, save images to the same directory convention the blog uses (ask user if unsure)
- If freeform topic, save to
~/tmp/and tell the user the paths
Error Handling
- Missing API key:
generate.pyauto-loads from~/.env. If still missing, tell the user to setGOOGLE_API_KEY - API error: Show the error message, suggest checking the API key or endpoint
- No jq: The helper script (
gemini-image.sh) requiresjq - No cwebp: Images will be saved as PNG instead of WebP — inform the user
Safety
- Always confirm before generating (API calls cost money)
- Never generate more than 5 images in a single run without explicit user approval
- Show each generated image to the user for review