Agent Skills: Image Explore - Visual Direction Brainstorming

Brainstorm multiple visual directions for a blog image, generate them in parallel, build a comparison page, and optionally publish as a shareable link (Surge.sh or gist).

UncategorizedID: idvorkin/chop-conventions/image-explore

Install this agent skill to your local

pnpm dlx add-skill https://github.com/idvorkin/chop-conventions/tree/HEAD/skills/image-explore

Skill Files

Browse the full folder contents for image-explore.

Download Skill

Loading file tree…

skills/image-explore/SKILL.md

Skill Metadata

Name
image-explore
Description
"Brainstorm multiple visual directions for a blog image, generate them in parallel, build a comparison page, and optionally publish as a shareable link (Surge.sh or gist)."

Image Explore - Visual Direction Brainstorming

Generate multiple distinct visual directions for a blog image, render them all in parallel, build a comparison page, and optionally publish as a shareable link for feedback.

Arguments

Parse the user's input for:

  • Target: A file path (e.g., _d/ai-native-manager.md) or a freeform topic (e.g., "chaos of AI adoption")
  • --count N: Number of directions to brainstorm and generate (default: 5, max: 8)
  • --variants N: Number of minor variants per direction (default: 1, max: 3). Each variant tweaks the scene (different angle, lighting, composition) while keeping the same concept and shirt text.
  • --style 'description': Override the default illustration style (passed through to gen-image)
  • --aspect 'W:H': Aspect ratio (default: 3:4). Valid: 1:1, 2:3, 3:2, 3:4, 4:3, 9:16, 16:9
  • --ref 'path': Override reference image (default: raccoon canonical ref)

Workflow

Phase 1: Analyze Content

If the target is a file path:

  1. Read the file
  2. Identify the hook — what's the one idea a reader should remember?
  3. Note key metaphors, section themes, emotional arc
  4. Check for existing images (look for imagefeature, local_image, blob_image includes)

If the target is a freeform topic:

  1. Use it directly as the creative brief
  2. Skip to Phase 2

Phase 2: Brainstorm Directions

This is the creative core. Generate --count distinct visual directions. Each direction must have:

  • Name: 2-4 evocative words (e.g., "Circus Ringmaster", "Surfing the Wave")
  • Section: Which part of the post it maps to (or "standalone")
  • Scene: One-sentence description of the image
  • Vibe: What feeling it evokes (e.g., "controlled chaos", "quiet confidence")
  • Shirt text: For raccoon style, what the shirt reads (max 8 chars)

Directions must be meaningfully different. Vary across these axes:

  • Literal vs. metaphorical
  • Action vs. stillness
  • Humor vs. gravitas
  • Individual vs. group scene
  • Indoor vs. outdoor / grounded vs. fantastical

Avoid generating 5 variations of the same idea. If the post has one dominant metaphor, use it for at most 2 directions and find fresh angles for the rest.

Present directions as a table:

| # | Name | Section | Scene | Vibe | Shirt | | --- | ---------------- | ------------- | ---------------------------------------- | -------------- | ------- | | A | Mission Control | Year of Chaos | Raccoon at NASA console, screens on fire | "This is fine" | SHIP IT | | B | Surfing the Wave | AI Adoption | Raccoon surfing tidal wave of AI debris | Riding chaos | SHIP IT |

Confirm with user via AskUserQuestion before generating. User may add, remove, or modify directions.

If --variants > 1: After user approves the directions, craft variant scenes for each. Each variant keeps the same concept, shirt text, and vibe, but varies the specific scene description (different angle, setting detail, composition, or lighting). Do NOT present variant scenes for approval — just generate them.

Phase 3: Generate Images in Parallel

  1. Resolve the script path once:

    CHOP_ROOT="$(cd "$(dirname "$(readlink -f ~/.claude/skills/image-explore/SKILL.md)")" && git rev-parse --show-toplevel)"
    GEN="$CHOP_ROOT/skills/image-explore/generate.py"
    
  2. Write a directions.json file with all directions (used by both Phase 3 and Phase 4).

    Without variants (1 entry per direction):

    [
      {
        "name": "Mission Control",
        "section": "Year of Chaos",
        "vibe": "This is fine",
        "shirt": "SHIP IT",
        "scene": "Raccoon at NASA console, screens showing fire",
        "output": "mission-control.webp"
      }
    ]
    

    With variants (multiple entries per direction, grouped by group field):

    [
      {
        "name": "Mission Control v1",
        "group": "Mission Control",
        "section": "Year of Chaos",
        "vibe": "This is fine",
        "shirt": "SHIP IT",
        "scene": "Raccoon at NASA console, screens showing fire, dramatic front view",
        "output": "mission-control-v1.webp"
      },
      {
        "name": "Mission Control v2",
        "group": "Mission Control",
        "section": "Year of Chaos",
        "vibe": "This is fine",
        "shirt": "SHIP IT",
        "scene": "Raccoon at NASA console seen from side, leaning back in chair sipping tea",
        "output": "mission-control-v2.webp"
      }
    ]
    

    The group field enables build-page.py to group variants under a shared heading. Output filenames follow the pattern {slug}-v{N}.webp when using variants.

    Scene-first prompt ordering ("scene_first": true):

    By default, prompts are assembled as: character style → "large & prominent 40%" → scene. This works well for character-focused images but fights wide-field compositions where characters should be small elements in a larger scene.

    Set "scene_first": true on any direction where the scene composition matters more than character prominence. This reorders the prompt to: scene → character style → shirt text, and drops the "40% of image" instruction. Use it for:

    • Bird's-eye/aerial views of fields, landscapes, maps
    • Group scenes where many characters are small
    • Any composition where the environment dominates
    {
      "name": "Overhead Field",
      "scene": "Aerial drone shot of a soccer field with raccoons scattered across it...",
      "shirt": "NEXT",
      "output": "overhead-field.webp",
      "scene_first": true
    }
    
  3. Generate all images in parallel with a single command:

    uv run "$GEN" --batch directions.json
    

    Pass --aspect, --ref, or --style if overriding defaults. The script handles env loading, prompt assembly, ref image resolution, and parallel execution via thread pool (secrets never leak into command strings).

  4. After batch completes, the directions JSON is automatically augmented with _prompt and _duration_s fields for each entry (used by the comparison page for debug info).

Phase 3b: Verify & Retry

After generation, verify each image actually matches its scene description before showing to the user. This catches cases where Gemini ignores complex scene descriptions (e.g., split-screens, multiple characters, specific compositions).

  1. Launch background sub-agents in parallel (one per image) to verify each result. Each agent should:
    • Read the generated image file (Claude has vision)
    • Compare what it sees against the scene description
    • Check for:
      • Scene composition: Does the layout match? (e.g., split-screen actually split, multiple characters present)
      • Key elements: Are the described elements visible? (e.g., dust cloud, trajectory arc, binoculars)
      • Shirt text: Is it readable and roughly correct?
    • Return a verdict: pass (scene clearly rendered) or fail with a short explanation of what's wrong
  2. Collect results from all agents. This runs concurrently and doesn't block other work.
  3. Write verification results back to the directions JSON — for each entry, add:
    • _verification: "pass" or "fail"
    • _verification_reason: short explanation (e.g., "Solo raccoon portrait, no soccer field or group scene") These fields are rendered in the comparison page's collapsible debug details.
  4. For any failures, retry up to 2 times:
    • Strengthen the scene description (be more explicit, add emphasis)
    • Re-run generate.py for just the failed entries (write a temporary batch JSON)
    • Launch verification agents again for the retried images
    • Update _verification and _verification_reason with the retry result
  5. After retries, report any still-failing images to the user rather than silently including bad results

What counts as a failure:

  • Single character when scene calls for a group (or vice versa)
  • Missing the core concept entirely (e.g., "split-screen" rendered as single scene)
  • Wrong setting (indoor when outdoor was specified)

What does NOT count as a failure:

  • Shirt text slightly wrong (Gemini often struggles with exact text)
  • Style differences from the reference
  • Minor composition differences (angle, lighting)
  1. Show all verified images to the user with the Read tool.

Phase 4: Build Comparison Page

Build and serve the comparison page (reuses the same directions.jsonbuild-page.py reads name/section/vibe/shirt and accepts either image or output for the file path):

uv run "$CHOP_ROOT/skills/image-explore/build-page.py" \
  --title "Image Explore: Topic Name" \
  --dir docs/image-explore-topic/ \
  --images-dir images/ \
  directions.json

Options:

  • --images-dir PATH: Where to find generated images (default: current directory). Useful when images were written to a different directory than where you run the command (e.g., images/).
  • --no-serve: Skip starting the HTTP server.

This creates the showboat doc, converts images, generates HTML via pandoc, and starts a local HTTP server. It prints the Tailscale URL.

When directions.json contains entries with a group field, the page groups variants under shared direction headings with sub-headers for each variant.

Phase 5: Publish (Ask First)

Ask the user: "Want to publish this as a shareable link?" and offer two options:

  1. Surge.sh (Recommended) — Full HTML/CSS/JS support, lightbox clicking works
  2. GitHub Gist — Simpler but gisthost may block inline JS (lightbox won't work)

Option A: Surge.sh

# Prepare deploy directory (random path to avoid accidental overwrites)
SURGE_DIR=$(mktemp -d /tmp/surge-XXXXXXXX)
cp <docs-dir>/demo.html "$SURGE_DIR/index.html"
cp <docs-dir>/*.png "$SURGE_DIR/"

# Deploy (pick a descriptive subdomain)
surge "$SURGE_DIR" <descriptive-name>.surge.sh

Option B: GitHub Gist

Uses the gist-image skill technique (create gist, clone, push binary files via git) plus gisthost-specific URL rewriting. The helper script automates this:

uv run "$CHOP_ROOT/skills/image-explore/publish-gist.py" demo.html --title "Description"

This handles: gist creation, image conversion to JPEG, URL rewriting, git push. It prints the gisthost URL. Note: gisthost may block inline <script> tags, so the lightbox/click-to-expand won't work.

Phase 6: Apply Selection (Optional)

If the user picks a winner, offer to:

  1. Update the blog post's imagefeature frontmatter
  2. Update any local_image_float_right includes
  3. Copy the chosen image to the blog's images directory with a permanent name

Default Raccoon Style

When no --style is given, use this base style (same as gen-image):

Cute anthropomorphic raccoon character with chibi proportions (oversized head, small body), dark raccoon mask markings around eyes, big friendly dark eyes, small black nose, round brown ears with lighter inner ear, soft brown felt/plush fur, striped ringed tail with brown and dark brown bands. Wearing big round rainbow-colored glasses (frames cycle through red, orange, yellow, green, blue, purple), green t-shirt with bold white text, blue denim shorts, IMPORTANT: mismatched Crocs shoes — one BLUE Croc on the left foot and one YELLOW Croc on the right foot (never the same color on both feet). Soft plush 3D/vinyl toy illustration style, studio softbox lighting, clean warm pastel background, subtle vintage film grain, children's book style. Full body.

Error Handling

  • Missing API key: Check ~/.env first, then stop and ask user
  • Generation failure: Report which direction failed, ask to retry or skip
  • showboat not installed: Fall back to plain markdown + pandoc without showboat
  • magick not installed: Warn, use PNG directly (larger files)

Tips

  • After user picks a winner, they may want to iterate on it — offer to re-run gen-image with refinements
  • If a direction doesn't render well, suggest prompt tweaks rather than just re-rolling
  • Keep the comparison page around for reference even after picking a winner