Agent Skills: Visual Verify Scores

This skill should be used when the user asks to 'verify visual output', 'check how it looks', 'render and review', 'visual verify', 'check the slide', 'does this look right', or when any task produces rendered visual output (slides, charts, documents, UI). Starts a render-vision-fix loop using Gemini vision.

UncategorizedID: edwinhu/workflows/visual-verify

Install this agent skill to your local

pnpm dlx add-skill https://github.com/edwinhu/workflows/tree/HEAD/skills/visual-verify

Skill Files

Browse the full folder contents for visual-verify.

Download Skill

Loading file tree…

skills/visual-verify/SKILL.md

Skill Metadata

Name
visual-verify
Description
"This skill should be used when the user asks to 'verify visual output', 'check how it looks', 'render and review', 'visual verify', 'check the slide', 'does this look right', or when any task produces rendered visual output (slides, charts, documents, UI). Starts a render-vision-fix loop using Gemini vision."

Announce: "I'm using visual-verify to set up a render-vision-fix loop."

<EXTREMELY-IMPORTANT> ## The Iron Law

NO VISUAL TASK IS COMPLETE WITHOUT RENDERING, SCORING, AND MEETING THE THRESHOLD.

Source code correctness does NOT imply visual correctness. You MUST render to PNG, score with context-enriched Gemini vision (0-10), and iterate until score >= 9.5. Claiming "done" with a score below threshold delivers broken visuals to the user.

Skipping the score check is NOT HELPFUL — the user gets a visual artifact with defects you didn't verify. </EXTREMELY-IMPORTANT>

Agentic Vision Routing

Route based on image complexity, not source language.

Agentic vision (--agentic) enables Gemini's Think-Act-Observe loop: it can crop, zoom, annotate, and re-examine regions of the image autonomously. This catches fine-grained defects that full-image viewing misses.

| Image characteristic | --agentic? | Why | |---|---|---| | Dense diagram (5+ nodes, many arrows/labels) | YES | Can crop individual arrow paths, zoom into small labels to check connections | | Small text or labels near container edges | YES | Implicit zoom catches clipping that full-image view misses | | Side-by-side or before/after comparison | YES | Can crop matching regions and annotate differences | | Simple layout (2-3 large elements) | NO | Full image is sufficient; agentic adds latency for no gain | | Single chart with large text | NO | No fine-grained details to investigate |

Decision rule: If the image has details that are easy to miss at full resolution, use --agentic. If you can see everything clearly at a glance, skip it.

The Loop

0. PAGE MAP -> If Typst + Touying: Skill("teaching:find-slide-page")
       |      Returns heading → physical page mapping
       |      Skip if: single-page file, non-Typst, or page already known
       |
1. CHANGE  -> Modify source code (Task agent)
       |
2. RENDER  -> Produce PNG (see references/render-commands.md)
       |      Render fails? -> fix source, back to step 1
       |
3. VISION  -> Two-part check:
       |      a) INTENT — Does the render match the design intent?
       |         (Does the visual structure argue what it should?)
       |      b) DEFECTS — Scan for the 9 visual defect categories:
       |         1. Text clipped by or overflowing its container
       |         2. Text or shapes overlapping other elements
       |         3. Arrows crossing through elements instead of routing around
       |         4. Arrows landing on wrong element or pointing into empty space
       |         5. Labels floating ambiguously (not anchored to what they describe)
       |         6. Labels squeezed between adjacent elements without clearance
       |         7. Uneven spacing (cramped sections next to spacious ones)
       |         8. Text too small to read at rendered size
       |         9. Parallel sub-diagrams with inconsistent layout
       |      Complexity-routed look-at call with SCORING:
       |      Dense/fine-grained? -> --agentic (Gemini crops/zooms/annotates)
       |      Simple/large elements? -> vision-only (structured pixel feedback)
       |      → Score 0-10 against checklist items
       |      → Record in SCORES.md
       |
4. DECIDE  -> Score >= 9.5 AND exit criteria met? → DONE
              Score < 9.5 OR defects remain?      → extract fixes, back to step 1

When to Stop

The loop is done when ALL of these hold:

  • Score >= 9.5
  • The rendered output matches the design intent (not just defect-free but correct)
  • No text is clipped, overlapping, or unreadable
  • All arrows/lines connect to the right elements and route cleanly
  • Spacing is consistent and composition is balanced
  • You'd show it to someone without caveats

Don't stop after one clean pass just because there are no critical bugs — if the composition could be better, improve it. Conversely, don't loop forever on cosmetics — 5 iterations max before escalating to the user.

Invocation

Skill(skill="ralph-loop:ralph-loop", args="Visual Task N: [TASK NAME] --max-iterations 5 --completion-promise VTASKN_9_5")

Score Tracking

Initialize SCORES.md before the first iteration:

# Visual Verify Scores

| Iteration | Score | Threshold | BLOCKING | COSMETIC | Delta |
|-----------|-------|-----------|----------|----------|-------|

Each vision call must score the output 0-10:

  • 10.0 = all checklist items pass, zero issues
  • 9.5 = 95% pass, 1-2 cosmetic issues remain (default threshold)
  • < 9.0 = BLOCKING issues present

The score reflects the fraction of checklist items that pass. Gemini counts BLOCKING and COSMETIC issues against the domain-specific checklist, and the score = (items passing / total items) * 10.

Vision Calls

Agentic (dense diagrams, fine-grained details, comparisons):

python3 ../look-at/scripts/look_at.py \
    --file "/tmp/visual-verify.png" \
    --goal "[CONTEXT-ENRICHED GOAL]" \
    --agentic

(Paths are relative to this skill's base directory.)

Gemini will autonomously crop, zoom, and annotate regions it wants to inspect more closely. This is most valuable for:

  • Verifying arrow connections in dense node diagrams
  • Checking whether small labels are fully visible (not clipped)
  • Comparing spatial consistency across sub-diagrams

Vision-only (simple layouts, large elements):

python3 ../look-at/scripts/look_at.py \
    --file "/tmp/visual-verify.png" \
    --goal "[CONTEXT-ENRICHED GOAL]"

Goal Assembly

NEVER call look-at with a generic goal. Goals must reference the spec, checklist items, and prior feedback.

| Context Piece | Source | |---------------|--------| | spec_text | SPEC.md, PLAN.md task, or user request | | checklist_items | Domain + task specific | | previous_feedback | Gemini's output from prior iteration |

For diagrams (fletcher, CeTZ): use describe-first strategy, NOT judgment prompts.

  • Step 1: Ask Gemini to transcribe all text elements with [FULL | PARTIAL] annotations
  • Step 2: Claude diffs transcription against expected elements from source code
  • Step 3 (optional): Run spatial/overlap check for non-clipping issues
  • Why: Judgment prompts ("is X visible?") invite yes-man answers. Transcription forces the model to report what it actually sees — clipping becomes objectively detectable via text mismatches.

See references/goal-templates.md for full copy-paste templates per domain.

Translating Non-Python Feedback

Before editing Typst source to apply fixes, load the tinymist skill: Skill(skill="tinymist:typst") It has Fletcher/CeTZ reference docs with correct parameter names (label-side, label-pos, spacing, inset). Without it, you will guess at parameter names and waste iterations.

Gemini returns structured pixel measurements. Claude translates to source code:

| Gemini says | Claude translates to (Typst example) | |-------------|--------------------------------------| | "Move label 15px left" | Adjust label-pos or node coordinates by ~0.5em | | "Text clipped at right edge" | Increase inset or reduce scale() percentage | | "Node/diagram cut off at edge" | Reduce canvas length, node inset, or spacing; or shift coordinates toward center | | "Elements overlap vertically" | Increase spacing parameter | | "Font too small to read" | Increase #set text(Npt) value |

Complex Diagrams (3+ Failed Iterations)

If the same spatial issue persists after 3 iterations, escalate to the reference sketch approach: have Gemini draw an ideal layout in matplotlib and translate coordinates.

See references/complex-diagram-strategy.md for the full approach.

Quick Render Reference

| Domain | Command | |--------|---------| | Typst | tinymist compile input.typ /tmp/visual-verify.png --pages N --ppi 144 (use /find-slide-page to get N) | | Python | python3 script.py (script saves to known path) | | Screenshot | screencapture -x /tmp/visual-verify.png |

See references/render-commands.md for the full reference.

When NOT to Use

  • One-off visual checks: Use look-at directly, not the full loop
  • Text-only verification: Use standard dev-verify
  • Compilation checks only: Just run the compile command

Reference Files

  • references/goal-templates.md -- copy-paste goal templates per domain
  • references/render-commands.md -- render commands for all supported domains
  • references/rationalization-prevention.md -- excuses, red flags, honesty framing, drive-aligned consequences
  • references/complex-diagram-strategy.md -- reference sketch approach for persistent layout failures
  • references/examples.md -- worked examples (Typst slide, matplotlib chart, diagram escalation)