Announce: "Using ds-verify (Phase 5) to confirm reproducibility and completion."
Contents
- The Iron Law of DS Verification
- Red Flags - STOP Immediately If You Think
- The Verification Gate
- Verification Checklist
- Reproducibility Demonstration
- Claims Requiring Evidence
- Insufficient Evidence
- Required Output Structure
- Completion Criteria
Verification Gate
Final verification with reproducibility checks and user acceptance interview.
<EXTREMELY-IMPORTANT> ## The Iron Law of DS VerificationNO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION. This is not negotiable.
Before claiming analysis is complete, you MUST:
- RE-RUN - Execute analysis fresh (not cached results)
- CHECK - Verify outputs match expectations
- REPRODUCE - Confirm results are reproducible
- ASK - Interview user about constraints and acceptance
- Only THEN claim completion
This applies even when:
- "I just ran it"
- "Results look the same"
- "It should reproduce"
- "User seemed happy earlier"
If you catch yourself thinking "I can skip verification," STOP - you're about to lie. </EXTREMELY-IMPORTANT>
Red Flags - STOP Immediately If You Think:
| Thought | Why It's Wrong | Do Instead | |---------|----------------|------------| | "Results should be the same" | Your "should" isn't verification | Re-run and compare | | "I ran it earlier" | Your earlier run isn't fresh | Run it again now | | "It's reproducible" | Your claim requires evidence | Demonstrate reproducibility | | "User will be happy" | Your assumption isn't their acceptance | Ask explicitly | | "Outputs look right" | Your visual inspection isn't verified | Check against criteria |
The Verification Gate
Before making ANY completion claim:
1. RE-RUN → Execute fresh, not from cache
2. CHECK → Compare outputs to success criteria
3. REPRODUCE → Same inputs → same outputs
4. ASK → User acceptance interview
5. CLAIM → Only after steps 1-4
Skipping any step is not verification.
Verification Checklist
Technical Verification
Outputs Match Expectations
- [ ] All required outputs generated
- [ ] Output formats correct (files, figures, tables)
- [ ] Numbers are reasonable (sanity checks)
- [ ] Visualizations render correctly
Reproducibility Confirmed
- [ ] Ran analysis twice, got same results
- [ ] Random seeds produce consistent output
- [ ] No dependency on execution order
- [ ] Environment documented (packages, versions)
Data Integrity
- [ ] Input data unchanged
- [ ] Row counts traceable through pipeline
- [ ] No silent data loss or corruption
User Acceptance Interview
CRITICAL: Before claiming completion, conduct user interview.
Step 1: Replication Constraints
AskUserQuestion:
question: "Were there specific methodology requirements I should have followed?"
options:
- label: "Yes, replicating existing analysis"
description: "Results should match a reference"
- label: "Yes, required methodology"
description: "Specific methods were mandated"
- label: "No constraints"
description: "Methodology was flexible"
If replicating:
- Ask for reference to compare against
- Verify results match within tolerance
- Document any deviations and reasons
Step 2: Results Verification
AskUserQuestion:
question: "Do these results answer your original question?"
options:
- label: "Yes, fully"
description: "Analysis addresses the core question"
- label: "Partially"
description: "Some aspects addressed, others missing"
- label: "No"
description: "Does not answer the question"
If "Partially" or "No":
- Ask which aspects are missing
- Return to
/ds-implementto address gaps - Re-run verification
Step 3: Output Format
AskUserQuestion:
question: "Are the outputs in the format you need?"
options:
- label: "Yes"
description: "Format is correct"
- label: "Need adjustments"
description: "Format needs modification"
Step 4: Confidence in Results
AskUserQuestion:
question: "Do you have any concerns about the methodology or results?"
options:
- label: "No concerns"
description: "Comfortable with approach and results"
- label: "Minor concerns"
description: "Would like clarification on some points"
- label: "Major concerns"
description: "Significant issues need addressing"
Reproducibility Demonstration
MANDATORY: Demonstrate reproducibility before completion.
# Run 1
result1 = run_analysis(seed=42)
hash1 = hash(str(result1))
# Run 2
result2 = run_analysis(seed=42)
hash2 = hash(str(result2))
# Verify
assert hash1 == hash2, "Results not reproducible!"
print(f"Reproducibility confirmed: {hash1} == {hash2}")
For notebooks:
# notebook-reproduce: Clear and re-run all cells from scratch
jupyter nbconvert --execute --inplace notebook.ipynb
# notebook-reproduce-with-seed: Execute notebook with fixed random seed for reproducibility
papermill notebook.ipynb output.ipynb -p seed 42
Claims Requiring Evidence
| Claim | Required Evidence | |-------|-------------------| | "Analysis complete" | All success criteria verified | | "Results reproducible" | Same output from fresh run | | "Matches reference" | Comparison showing match | | "Data quality handled" | Documented cleaning steps | | "Methodology appropriate" | Assumptions checked |
Insufficient Evidence
These do NOT count as verification:
- Previous run results (must be fresh)
- "Should be reproducible" (demonstrate it)
- Visual inspection only (quantify where possible)
- Single run (need reproducibility check)
- Skipped user acceptance (must ask)
Required Output Structure
## Verification Report: [Analysis Name]
### Technical Verification
#### Outputs Generated
- [ ] Output 1: [location] - verified [date/time]
- [ ] Output 2: [location] - verified [date/time]
#### Reproducibility Check
- Run 1 hash: [value]
- Run 2 hash: [value]
- Match: YES/NO
#### Environment
- Python: [version]
- Key packages: [list with versions]
- Random seed: [value]
### User Acceptance
#### Replication Check
- Constraint: [none/replicating/required methodology]
- Reference: [if applicable]
- Match status: [if applicable]
#### User Responses
- Results address question: [yes/partial/no]
- Output format acceptable: [yes/needs adjustment]
- Methodology concerns: [none/minor/major]
### Verdict
**COMPLETE** or **NEEDS WORK**
[If COMPLETE]
- All technical checks passed
- User accepted results
- Reproducibility demonstrated
[If NEEDS WORK]
- [List items requiring attention]
- Recommended next steps
Completion Criteria
Only claim COMPLETE when ALL are true:
- [ ] All success criteria from SPEC.md verified
- [ ] Results reproducible (demonstrated, not assumed)
- [ ] User confirmed results address their question
- [ ] User has no major concerns
- [ ] Outputs in acceptable format
- [ ] If replicating: results match reference
Both technical and user acceptance must pass. No shortcuts.
Workflow Complete
When user confirms all criteria are met:
Announce: "DS workflow complete. All 5 phases passed."
The /ds workflow is now finished. Offer to:
- Export results to final format
- Clean up
.claude/files - Start a new analysis with
/ds