Evidence Self-loop (C3/C4 fix → rebind → redraft) Skill

Evidence Self-loop (C3/C4 fix → rebind → redraft)

Purpose: make the evidence-first pipeline converge without writing filler prose.

This skill reads the intermediate evidence artifacts (briefs/bindings/packs) and produces an actionable TODO list that answers:

Which subsections are under-supported?
Is the problem mapping/coverage (C2) or evidence extraction (C3) or binding/planning (C4)?
Which skill(s) should be rerun, in what order, to unblock high-quality writing?

Inputs

outline/subsection_briefs.jsonl
outline/evidence_bindings.jsonl (expects binding_gaps / binding_rationale if available)
outline/evidence_drafts.jsonl (expects blocking_missing, comparisons, eval protocol, limitations)
Optional (improves routing):
- outline/evidence_binding_report.md
- outline/anchor_sheet.jsonl
- papers/paper_notes.jsonl
- papers/fulltext_index.jsonl
- queries.md

Outputs

output/EVIDENCE_SELFLOOP_TODO.md (report-class; always written)

Self-loop contract (what “fixing evidence” means)

Prefer fixing upstream evidence, not writing around gaps.
If an evidence pack has blocking_missing, treat it as a STOP signal: strengthen notes/fulltext/mapping, then regenerate packs.
If bindings show binding_gaps, treat it as a ROUTING signal: either enrich the evidence bank for the mapped papers, expand mapping coverage, or adjust required_evidence_fields if unrealistic.

Recommended rerun chain (minimal):

If C3 evidence is thin: pdf-text-extractor → paper-notes → evidence-binder → evidence-draft → anchor-sheet → writer-context-pack
If C2 coverage is weak: section-mapper → outline-refiner → (then rerun C3/C4 evidence skills)

Workflow (analysis-only)

Read queries.md (if present)

Use it only as a soft config hint (evidence_mode / draft_profile); do not override the artifact contract.

Read outline/subsection_briefs.jsonl

For each sub_id, capture axes + required_evidence_fields (what evidence types this subsection expects).

Read outline/evidence_bindings.jsonl

For each sub_id, surface binding_rationale and binding_gaps (what the binder could/could not cover from the evidence bank).

(Optional) Read outline/evidence_binding_report.md

Use it as a human-readable summary; treat it as a view of outline/evidence_bindings.jsonl, not a separate truth source.

Read outline/evidence_drafts.jsonl

Surface blocking_missing (STOP signals), and check for missing comparisons / eval protocol / limitations that would force hollow writing.

(Optional) Read outline/anchor_sheet.jsonl

Check whether each subsection has at least a few citation-backed anchors (numbers / evaluation / limitations).

(Optional) Read papers/paper_notes.jsonl and papers/fulltext_index.jsonl

Use these to route fixes: if evidence is abstract-only and missing eval tokens, prefer enriching notes/fulltext before drafting prose.

What the report contains

Summary counts: subsections with blocking_missing, with binding_gaps, and common failure reasons.
Per-subsection TODO: the smallest upstream fix path (skills + artifacts) to make the subsection writeable.

Status semantics (unblock rules)

This skill is the prewrite router for evidence quality. Treat its Status: line as the unblock contract:

PASS: no blocking_missing and no binding_gaps -> proceed to C5 writing (but still scan non-blocking writability smells: low comparisons/eval/anchors often predict hollow prose).
OK: no blocking_missing, but some binding_gaps -> you may draft, but expect weaker specificity; prefer fixing gaps first.
FAIL: missing inputs OR any blocking_missing -> do not write filler prose; fix upstream and rerun C3/C4.

Routing matrix (symptom -> root cause -> upstream fix)

Use this as a semantic routing table (not a script checklist). The goal is to fix the earliest broken intermediate artifact.

| Symptom (where you see it) | Likely root cause | Inspect first | Smallest upstream fix chain | |---|---|---|---| | evidence_drafts.blocking_missing: no usable citation keys | mapped papers lack bibkey / bibkeys not in citations/ref.bib | papers/paper_notes.jsonl (bibkey fields), citations/ref.bib | C3 paper-notes (ensure bibkeys) -> C4 citation-verifier -> rerun evidence-binder -> rerun evidence-draft | | blocking_missing: title-only evidence | retrieval/metadata lacks abstracts (or aggressive filtering) | papers/papers_raw.jsonl abstracts, papers/paper_notes.jsonl evidence_level | C1 literature-engineer (enrich metadata) OR C3 pdf-text-extractor (fulltext) -> rerun paper-notes | | blocking_missing: no evidence snippets extractable | notes are too thin / evidence bank empty for mapped papers | papers/evidence_bank.jsonl (counts), papers/paper_notes.jsonl | C3 paper-notes (richer extraction; prefer fulltext when possible) -> rerun C4 packs | | blocking_missing: no concrete evaluation tokens | notes/bank did not extract benchmarks/metrics/budgets | papers/paper_notes.jsonl (metrics/benchmarks fields), outline/anchor_sheet.jsonl | C3 paper-notes (extract eval anchors) -> rerun anchor-sheet + evidence-draft | | evidence pack comparisons are sparse (signals: comparisons low) | clusters are not contrastable OR mapping coverage too weak | outline/subsection_briefs.jsonl (clusters), outline/mapping.tsv | C2 section-mapper (coverage) OR C3 subsection-briefs (better clusters) -> rerun evidence-draft | | bindings.binding_gaps mentions benchmarks/metrics/protocol | binder cannot find evaluation-tagged evidence for this subsection | outline/evidence_binding_report.md (tag mix), papers/evidence_bank.jsonl tags | C3 paper-notes (tag/evidence extraction) OR C2 expand mapping for that subsection -> rerun evidence-binder | | binding_gaps mentions security/threat model/attacks | mapped set lacks security-focused works or notes lack threat-model detail | outline/mapping.tsv, papers/paper_notes.jsonl | C2 expand mapping (+ C1 queries if needed) OR C3 enrich notes -> rerun binder/packs | | binding report looks mechanically uniform across H3 (same mix, low tag variance) | binder selection too recipe-like OR evidence bank tags too coarse | outline/evidence_binding_report.md (tag mix), evidence bank tags | tighten required_evidence_fields + improve evidence bank tags, then rerun binder; avoid writing around non-specific bindings |

Interface with the writer self-loop (avoid writing around evidence)

If writer-selfloop is FAIL due to missing anchors/comparisons and the corresponding writer pack has pack_warnings, stop and run this evidence self-loop: the section is telling you the pack is not writeable.
Prefer fixing evidence gaps once, upstream, rather than patching every H3 with generic filler.

What this skill does NOT do

It does not edit papers/*, outline/*, or sections/*.
It does not invent new facts/citations.
It does not "relax" quality by changing thresholds; it routes you to the earliest artifact to fix.

Script

Quick Start

python .codex/skills/evidence-selfloop/scripts/run.py --workspace workspaces/<ws>

All Options

--workspace <dir>
--unit-id <U###> (optional)
--inputs <semicolon-separated> (optional override)
--outputs <semicolon-separated> (optional override; default writes output/EVIDENCE_SELFLOOP_TODO.md)
--checkpoint <C#> (optional)

Examples

Generate an evidence TODO list after C4 packs are generated:
- python .codex/skills/evidence-selfloop/scripts/run.py --workspace workspaces/<ws>

Agent Skills: Evidence Self-loop (C3/C4 fix → rebind → redraft)

Install this agent skill to your local

Skill Files