Prose Polish Redline
Composable prose-editing system that runs focused agents in parallel, merges their edits with conflict resolution, and outputs tracked-changes .docx + animated HTML replay.
Fidelity Firewall (binds every editing agent — never violate)
This system auto-applies every agent's edits to a tracked-changes .docx. There is no human in the loop between an agent's JSON and the document, so an invented fact lands in the author's file looking authoritative. Therefore the hard rule, for all Phase-1 and Phase-2 agents:
No edit may introduce a citation, statistic, number, date, name, quote, study, or fact that is not already present in the source document. Editing means sharpening the author's words — never adding evidence on their behalf.
When a claim is weak because it lacks a specific the document doesn't contain, an agent has exactly three honest moves:
- Sharpen from what's there — recast using only material already in the document.
comment— flag the gap for the author ("cite the specific study","what metric supports this?"). Do not fill it in.- Soften the claim to match the evidence actually present.
insert and replace are the danger paths. A replace may only swap phrasing, never inject a new fact. An insert has no original_text to match, so the verbatim guard does NOT protect against invented content — an insert's new_text may add transitions/conjunctions/connective phrasing only, never a citation/number/name/date/fact. A fabricated citation that happens to match verbatim still passes every other guard; this firewall is the only thing that stops it. The merge/apply scripts do not check fidelity — the agents are the sole gate.
Quick Start
/prose-polish-redline essays/range-framework-essay.md
/prose-polish-redline essays/range-framework-essay.md --depth aggressive
/prose-polish-redline essays/range-framework-essay.md --depth conservative --genre academic
/prose-polish-redline essays/range-framework-essay.md --dry-run
Pipeline
INPUT: essay.md [--depth moderate] [--genre academic] [--dry-run]
│
├── md_to_docx.py → .docx
├── extract_text.py → plain text (canonical for all agents)
│
├── Wave 0: genre-scorer → genre + 6D quality profile
│
├── Wave 1 (parallel): Phase 1 kata agents → edit JSONs
│ ├── coherence-agent.md
│ ├── authority-agent.md
│ ├── claims-agent.md
│ └── stakes-agent.md
│
├── merge_edits.py → merged Phase 1 edits
│
├── Wave 2 (parallel): Phase 2 kata agents → edit JSONs
│ ├── rhythm-agent.md
│ ├── hedge-agent.md
│ ├── personality-agent.md
│ └── perspective-agent.md
│
├── merge_edits.py → final merged review JSON
│
├── apply_redlines.py → tracked-changes .docx
└── generate_replay.py → animated HTML replay
OUTPUT: {stem}_reviewed.docx + {stem}_replay.html + {stem}_review.json
Process
Step 0: Setup
- Determine the working directory: use the input file's directory
- Set depth from CLI arg or default to
moderate - Set genre from CLI arg or auto-detect in Wave 0
Step 1: Convert and Extract
python ~/.claude/skills/prose-polish-redline/scripts/md_to_docx.py INPUT.md OUTPUT.docx
python ~/.claude/skills/prose-polish-redline/scripts/extract_text.py OUTPUT.docx -o EXTRACTED.txt
Report: "Converted {INPUT.md} → {OUTPUT.docx} ({N} paragraphs extracted)"
Step 2: Wave 0 — Genre Scoring
Run the genre-scorer agent with the extracted text.
Input to agent: Full contents of EXTRACTED.txt
Agent prompt: Load agents/genre-scorer.md and follow its instructions
Output: Genre result JSON (genre, scores, priorities)
Report: "Genre: {genre} (confidence: {confidence}). Priority dimensions: {priorities}"
Save genre result to {stem}_genre.json
Step 3: Wave 1 — Phase 1 Agents (Parallel)
Based on depth, launch Phase 1 agents in parallel:
| Depth | Agents | |-------|--------| | conservative | coherence-agent, authority-agent | | moderate | coherence-agent, authority-agent, claims-agent, stakes-agent | | aggressive | coherence-agent, authority-agent, claims-agent, stakes-agent |
Input to each agent:
- Full contents of EXTRACTED.txt
- Genre result JSON from Wave 0
- The agent's prompt from
agents/{agent-name}.md - The edit schema from
references/edit-schema.md
Critical instruction for each agent: "You MUST use the FULL contents of EXTRACTED.txt passed below as your source document. Do not summarize, truncate, or paraphrase it. Your original_text must be a verbatim character-for-character copy-paste from this document text. This is code-level exact string matching — if your original_text is off by even one character, the edit will silently fail."
Model guidance: For authority-agent, claims-agent, and stakes-agent, consider using model: "opus" for better instruction-following on the verbatim constraint. These agents are prone to fabricating original_text. Coherence-agent works well with sonnet.
Output from each agent: Edit JSON per edit-schema.md
Save each agent's output to {stem}_{agent-name}.json
Report: "Wave 1 complete: {N} total edits from {M} agents"
Step 4: Phase 1 Merge
python ~/.claude/skills/prose-polish-redline/scripts/merge_edits.py \
--document EXTRACTED.txt \
--phase1 {stem}_coherence-agent.json {stem}_authority-agent.json ... \
-o {stem}_phase1_merged.json
Report per-agent match rates from the merge JSON's stats.per_agent field:
Phase 1 merge: {final_count} edits kept ({duplicates} dupes, {conflicts} conflicts)
coherence-agent: 14/15 matched
claims-agent: 6/6 matched
authority-agent: 0/6 matched ⚠️ — re-run recommended
stakes-agent: 0/4 matched ⚠️ — re-run recommended
Match-rate gate: If any agent has 0% match rate with >0 input edits, warn the operator explicitly. Do not silently continue. Suggest: "Agent {name} produced {N} edits but none matched the document. Consider re-running with opus model, or check that the full document text was passed to the agent."
Also report unmatched edits from the merge JSON's unmatched array — these show which specific original_text values failed to locate.
If --dry-run is set: Stop here. Report match rates and edit counts, then skip Steps 5-8.
Step 5: Wave 2 — Phase 2 Agents (Parallel)
Skip if depth is conservative.
Based on depth, launch Phase 2 agents in parallel:
| Depth | Agents | |-------|--------| | moderate | rhythm-agent, hedge-agent | | aggressive | rhythm-agent, hedge-agent, personality-agent, perspective-agent |
Input to each agent:
- Phase-1-edited text (apply Phase 1 edits to extracted text to produce this)
- Genre result JSON from Wave 0
- The agent's prompt from
agents/{agent-name}.md - The edit schema from
references/edit-schema.md
Critical instruction: "You are receiving Phase-1-edited text. Your original_text must match THIS text, not the original document."
Model guidance: For hedge-agent, consider using model: "opus" for better instruction-following on the verbatim constraint. The hedge-agent's connective-diagnostic kata is prone to fabricating connectives (inserting "However", "Moreover" into original_text that doesn't contain them). Rhythm-agent works well with sonnet.
Save each agent's output to {stem}_{agent-name}.json
Report: "Wave 2 complete: {N} total edits from {M} agents"
Step 6: Final Merge
python ~/.claude/skills/prose-polish-redline/scripts/merge_edits.py \
--document EXTRACTED.txt \
--phase1 {stem}_coherence-agent.json {stem}_authority-agent.json ... \
--phase2 {stem}_rhythm-agent.json {stem}_hedge-agent.json ... \
-o {stem}_review.json
Report final merge with per-agent breakdown (same format as Step 4). Include Phase 2 agents in the per-agent report.
Step 7: Apply Redlines
If --dry-run is set: Skip this step and Step 8.
python ~/.claude/skills/prose-polish-redline/scripts/apply_redlines.py \
OUTPUT.docx {stem}_review.json OUTPUT_DIR
Report: "Redlined document: {path} (match rate: {rate}%)"
Match rate warnings:
- 90%+: Excellent — proceed
- 80-89%: Good — note unmatched edits in output
- <80%: Warning — investigate unmatched edits, likely text normalization issues
Step 8: Generate Replay
python ~/.claude/skills/prose-polish-redline/scripts/generate_replay.py \
OUTPUT.docx {stem}_review.json \
-o {stem}_replay.html
Report: "Replay generated: {path} ({size})"
Step 9: Summary
Present a final summary:
PROSE POLISH REDLINE COMPLETE
Document: {input}
Genre: {genre} ({confidence})
Depth: {depth}
Agents: {count}
Quality Profile (before):
Craft: {score}/10
Coherence: {score}/10
Authority: {score}/10
Purpose: {score}/10
Voice: {score}/10
Edits by Tier:
STRUCTURAL: {count}
COHERENCE: {count}
AUTHORITY: {count}
CRAFT: {count}
VOICE: {count}
Match Rate: {rate}%
Per Agent:
coherence-agent: {matched}/{input} matched
authority-agent: {matched}/{input} matched
claims-agent: {matched}/{input} matched
stakes-agent: {matched}/{input} matched
Outputs:
Tracked changes: {docx_path}
Replay animation: {html_path}
Review JSON: {json_path}
Depth Control
| Depth | Wave 0 | Wave 1 | Wave 2 | Total Agents | |-------|--------|--------|--------|-------------| | conservative | genre-scorer | coherence + authority | — | 3 | | moderate (default) | genre-scorer | coherence + authority + claims + stakes | rhythm + hedge | 7 | | aggressive | genre-scorer | coherence + authority + claims + stakes | rhythm + hedge + personality + perspective | 9 |
Dry-Run Mode
When --dry-run is specified, the pipeline runs Steps 0-4 (genre scoring, Wave 1 agents, Phase 1 merge) but skips apply_redlines and generate_replay. This gives a fast feedback loop for prompt tuning:
- Runs agents and merge — reports per-agent match rates and edit counts
- Does NOT produce .docx or .html output files
- Useful for: testing agent prompts, checking match rates, verifying verbatim constraint compliance
Tier System
| Tier | Color | Phase | Focus | |------|-------|-------|-------| | STRUCTURAL | Blue (#2b6cb0) | 1 | Organization, section flow | | COHERENCE | Teal (#319795) | 1 | Logic, transitions, causal flow | | AUTHORITY | Purple (#6b46c1) | 1 | Expertise signals, stakes | | CRAFT | Orange (#dd6b20) | 2 | Rhythm, precision, density | | VOICE | Green (#38a169) | 2 | Personality, perspective |
Error Handling
- Agent fails to produce valid JSON: Skip that agent's edits, log warning, continue
- Agent 0% match rate: Warn the operator explicitly. The agent's
original_textvalues didn't match the document. Suggest re-running with opus model or verifying full document text was passed. Check theunmatchedarray in merge output for specifics. - Match rate below 80%: Warn but don't abort — some edits are still valuable
- No edits from an agent: Normal for well-written documents. Report "0 edits" and continue
- Merge conflict losses: Logged in discarded array with reason — reviewable in the JSON
Dependencies
- Python 3.10+
- python-docx (
pip install python-docx)
Reference Files
references/edit-schema.md— JSON contract for all agentsreferences/tier-mapping.md— Tier definitions and priority orderreferences/genre-calibration.md— Genre-specific thresholds