Implementation (Output-First Verification) Skill

Overview

Apply output-first verification at every step of analysis implementation. This is Phase 3 of the /ds workflow.

The Iron Law of DS Implementation - EVERY step MUST produce visible output
Delegation - Main chat orchestrates, subagents analyze
What Output-First Means
Red Flags
SAS Language Routing - Load SAS enforcement when PLAN.md specifies SAS
Implementation Process
Verification Patterns - See references/verification-patterns.md
Common Failures
Gate: Exit Implementation

Implementation (Output-First Verification)

Implement analysis with mandatory visible output at every step. NO TDD - instead, every code step MUST produce and verify output.

<EXTREMELY-IMPORTANT> ## The Iron Law of DS Implementation

EVERY CODE STEP MUST PRODUCE VISIBLE OUTPUT. This is not negotiable.

Before moving to the next step, you MUST:

Run the code
See the output (print, display, plot)
Verify output is correct/reasonable
Document in LEARNINGS.md
Only THEN proceed to next step

This applies even when YOU think:

"I know this works"
"It's just a simple transformation"
"I'll check results at the end"
"The code is straightforward"

If you're about to write code without outputting results, STOP. </EXTREMELY-IMPORTANT>

Delegation

<EXTREMELY-IMPORTANT> **YOU MUST NOT WRITE ANALYSIS CODE IN MAIN CHAT. This is not negotiable.**

You orchestrate. Subagents analyze. For every task in PLAN.md, use the delegation skill:

Read("${CLAUDE_PLUGIN_ROOT}/lib/skills/ds-delegate/SKILL.md")

This is MANDATORY. ds-delegate contains the Task agent templates, output-first protocol details, methodology review patterns, and rationalization prevention. Do not attempt to summarize or shortcut it.

If you're about to write analysis code directly, STOP and read ds-delegate.

If you wrote analysis code in main chat, DELETE it immediately and dispatch a Task agent instead. Code written in main chat is contaminated by orchestrator context and must not be kept. </EXTREMELY-IMPORTANT>

What Output-First Means

| DO | DON'T | |-------|----------| | Print shape after each transform | Chain operations silently | | Display sample rows | Trust transformations work | | Show summary stats | Wait until end to check | | Verify row counts | Assume merges worked | | Check for unexpected nulls | Skip intermediate checks | | Plot distributions | Move on without looking |

The Mantra: If not visible, it cannot be trusted.

Red Flags - STOP Immediately

| Thought | Why It's Wrong | Do Instead | |---------|----------------|------------| | "I'll check at the end" | STOP - you're letting errors compound silently | Check after every step | | "This transform is simple" | STOP - simple code can still be wrong | Output and verify | | "I know merge worked" | STOP - you've assumed this before and been wrong | Check row counts | | "Data looks fine" | STOP - you're confusing "looks" with verification | Print stats, show samples | | "I'll batch the outputs" | STOP - you're about to lose your ability to isolate issues | Output per operation | | "Just a quick plot in main chat" | STOP - you're about to violate delegation | Spawn a Task agent |

Implementation Strategy Choice

After prerequisites pass and PLAN.md verified, check for parallelization potential:

Skip this choice when:

PLAN.md has fewer than 4 tasks
All tasks are dependent (every task is after N with no independent groups)
Tasks form a pipeline (clean → merge → aggregate → model)
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS is not available

Otherwise, ask the user:

AskUserQuestion(questions=[{
  "question": "How should we implement the analysis tasks in PLAN.md?",
  "header": "Strategy",
  "options": [
    {"label": "Sequential (Default)", "description": "One task at a time with output-first verification. Safest, most DS work is sequential."},
    {"label": "Agent team (parallel)", "description": "Spawn analyst per independent task group. Only for truly independent analysis branches (descriptive stats by subgroup, model comparisons). Requires reconciliation."}
  ],
  "multiSelect": false
}])

If Sequential: Proceed to Implementation Process below (current behavior).

If Agent team: Skip to Agent Team Implementation (Parallel).

SAS Language Routing

<EXTREMELY-IMPORTANT> **After reading PLAN.md, check the `Implementation Language` field. If it says SAS or Mixed, you MUST load the SAS performance enforcement BEFORE dispatching any SAS tasks.**

# If PLAN.md contains "Implementation Language: SAS" or "Mixed":
Read("${CLAUDE_PLUGIN_ROOT}/skills/wrds/references/sas-etl.md")

SAS subagent prompts MUST include the following enforcement block (paste into every SAS Task agent prompt):

## SAS Performance Enforcement (Non-Negotiable)

Before writing ANY SAS code, validate against these rules:

### WHERE Clauses
- **NEVER** wrap indexed columns in functions: year(date), month(date), datepart(dt), upcase(), substr()
- **ALWAYS** use range-based date filters: `where date between "01jan&year."d and "31dec&year."d`
- If you write `where year(anything)`, DELETE it and rewrite as BETWEEN.

### Merge Strategy
- Small lookup + large table → hash object (declare hash h; h.defineKey; h.defineData; h.defineDone)
- NEVER use PROC SORT + DATA merge for lookup joins when hash is possible
- Sort-merge ONLY when both tables exceed 50M rows (document justification)

### Parallelism
- Multi-year jobs → SGE array (#$ -t start-end), NOT %do loops
- Pass year via -sysparm, NOT -set or %sysget
- Per-year log files, NOT shared log

### Macro Safety
- Double quotes in h.output() for macro resolution (NEVER single quotes)
- Terminate macro vars with period: &year. not &year
- Assign hash methods to temp vars before put statements

### Self-Check Before Submitting Code
- [ ] No function-wrapped WHERE clauses on indexed columns
- [ ] Hash used for all lookup merges
- [ ] SGE array for multi-year processing
- [ ] Double quotes where macro resolution needed
- [ ] Single-year benchmark before full array

Claiming SAS code is complete without checking the WHERE clause patterns is LYING about code quality. </EXTREMELY-IMPORTANT>

Implementation Process

Step 1: Read Plan, Load ETL Enforcement, and Delegation Skill

Read(".claude/PLAN.md")
Read("${CLAUDE_PLUGIN_ROOT}/lib/skills/ds-delegate/SKILL.md")

Follow the task order defined in the plan. Use ds-delegate's templates for every task.

ETL Strategy Enforcement — load domain-specific references based on PLAN.md:

If PLAN.md contains an ## ETL Strategy section, the user made decisions during planning that MUST be enforced during implementation. Check each subsection and load the corresponding enforcement:

| PLAN.md Section | Enforcement Reference | Inject Into | |-----------------|----------------------|-------------| | Implementation Language: SAS or Mixed | Read("${CLAUDE_PLUGIN_ROOT}/skills/wrds/references/sas-etl.md") | Every SAS subagent prompt | | Filter Strategy table present | Read("${CLAUDE_PLUGIN_ROOT}/lib/skills/ds-implement/references/etl-enforcement.md") § Filter Push-Down | Subagent prompts for data loading tasks | | Parallelism Plan table present | Read("${CLAUDE_PLUGIN_ROOT}/lib/skills/ds-implement/references/etl-enforcement.md") § Parallelism | Implementation strategy choice | | Data Flow with intermediates | Read("${CLAUDE_PLUGIN_ROOT}/lib/skills/ds-implement/references/etl-enforcement.md") § Caching | Subagent prompts for tasks producing/consuming intermediates | | Scale-Up Testing Plan table present | Read("${CLAUDE_PLUGIN_ROOT}/lib/skills/ds-implement/references/etl-enforcement.md") § Scale-Up + domain reference (e.g., gemini-batch/references/scale-up-testing.md) | Before any batch submission task |

If PLAN.md has NO ETL Strategy section: Skip this — proceed directly to Step 2.

Step 2: Execute Each Task via Delegation

For each task in PLAN.md:

Dispatch analyst subagent (per ds-delegate pattern)
Verify outputs are present and reasonable
Dispatch methodology reviewer (for statistical tasks)
Log findings to LEARNINGS.md

Step 3: Log to LEARNINGS.md

Document every significant step:

## Task N: [Description] - COMPLETE

**Input:** [Describe input state]

**Operation:** [What was done]

**Output:**
- Shape: [final shape]
- Key findings: [observations]

**Verification:** [How you confirmed it worked]

**Next:** [What comes next]

Verification Patterns

See references/verification-patterns.md for detailed code patterns for:

Data loading, filtering, merging
Aggregation and model training
Batch pipeline scale-up testing (submission, validation, cost extrapolation)
Quick reference table by operation type

See references/etl-enforcement.md for ETL strategy enforcement:

Filter push-down (database vs application vs hybrid)
Parallelism (Task agents vs SGE vs sequential)
Intermediate caching (parquet vs CSV vs SQLite)
Scale-up testing domain routing

Scale-Up Testing Protocol (Batch/ETL Operations)

Triggers when PLAN.md includes a Scale-Up Testing Plan table (created by ds-plan when tasks involve batch APIs, irreversible operations, or >500 items through external services).

<EXTREMELY-IMPORTANT> ### The Iron Law of Scale-Up Testing

NO FULL BATCH WITHOUT A SUCCESSFUL TEST BATCH. This is not negotiable.

This is TDD for ETL: fail at 10 items in minutes, not at 21,000 items in hours. Before submitting production workloads, you MUST validate at small scale and verify outputs are correct — not just "successful." </EXTREMELY-IMPORTANT>

The Protocol

For each task with a scale-up plan in PLAN.md:

Stage 1 — Test Batch (~10 items). ALWAYS required.

Submit a batch of ~10 representative items
Wait for completion
Parse ALL responses — verify non-empty, correct schema, expected structure
Quality review: read EVERY output yourself (it's only 10 — no sampling needed, no judge needed)
Gate: Success rate ≥ 90% AND outputs parse correctly AND quality review passes

Stage 2 — Intermediate Batch (~100 items). Required if total > 500.

Submit ~100 items (include edge cases: large files, unusual formats, boundary conditions)
Check error rate distribution — are failures random or systematic?
LLM-as-judge quality review: randomly sample 10 outputs, send each to a stronger model (e.g., Gemini 3 Pro) with scoring rubric. Score: 1 = correct & complete, 0.5 = partially correct, 0 = wrong/empty. Log all scores.
Extrapolate cost and time for full batch
Gate: Success rate ≥ 95% AND judge avg quality ≥ 80% AND cost/time extrapolation acceptable AND no systematic failures

Stage 3 — Large Test Batch (~1,000 items). Required if total > 5,000.

Submit ~1,000 items
Verify rate limits are not hit
LLM-as-judge quality review: randomly sample 20 outputs, judge with same rubric as Stage 2. Compare quality distribution to Stage 2 — any degradation at scale?
Confirm cost tracking matches extrapolation
Gate: Success rate ≥ 95% AND judge quality consistent with Stage 2 AND no rate limit issues AND cost confirmed

Full Batch — Submit with confidence.

Only after all required stages pass their gates
Document final batch parameters in LEARNINGS.md

Scale-Up Rationalization Table - STOP If You Think:

| Excuse | Reality | Do Instead | |--------|---------|------------| | "I already tested the prompt interactively" | Interactive ≠ batch. Schema, format, and parameters differ. Batch-specific bugs only appear in batch. | Run a 10-item batch test. It takes 5 minutes. | | "The first 10 worked, 21K will be fine" | Stage 1 catches format errors. Stage 2 catches edge cases and error rate patterns. Skipping stages hides systematic failures. | Follow ALL required stages for your batch size. | | "I'll check the results after the full batch" | By then you've wasted hours/dollars. Errors compound silently at scale. | Verify at each stage before scaling up. | | "The API returned 200 OK, so it worked" | HTTP 200 means the request succeeded, not the output. Empty responses, malformed JSON, and hallucinated content all return 200. | Parse and inspect actual response content. | | "Running test batches slows down the pipeline" | A 10-item test takes minutes. Resubmitting 21K items after a schema error takes hours. | Test batches are the fastest path to production. | | "The outputs parsed correctly, so quality is fine" | Parsing checks structure, not content. LLM outputs can be structurally valid but factually wrong, incomplete, or hallucinated. | Randomly sample and actually read outputs at every stage. |

Red Flags - STOP If You Catch Yourself:

| Action | Why Wrong | Do Instead | |--------|-----------|------------| | About to submit full batch without a test batch | You will discover errors at maximum cost | Submit 10 items first | | Checking only that the API returned success | "Success" means the request was processed, not that output is correct | Parse and read actual response content | | Skipping Stage 2 because Stage 1 passed | Edge cases and error rate patterns only emerge at ~100 items | Follow the scale-up plan from PLAN.md | | Not extrapolating cost before scaling up | Full batch could cost 10x what you expected | Calculate: (stage cost / stage items) × total items |

Common Failures to Avoid

| Failure | Why It Happens | Prevention | |---------|----------------|------------| | Silent data loss | Merge drops rows | Print row counts before/after | | Hidden nulls | Join introduces nulls | Check null counts after joins | | Wrong aggregation | Groupby logic error | Display sample groups | | Type coercion | Pandas silent conversion | Verify dtypes after load | | Off-by-one | Date filtering edge cases | Print min/max dates |

If Output Looks Wrong

STOP - do not proceed
Investigate - print more details
Document - log the issue in LEARNINGS.md
Ask - if unclear, ask user for guidance
Fix - only proceed after output verified

Never hide failures. Bad output documented is better than silent failure.

No Pause Between Tasks

<EXTREMELY-IMPORTANT> **After completing task N, IMMEDIATELY start task N+1. You MUST NOT pause.**

| Thought | Reality | |---------|---------| | "Task done, should check in with user" | You're wasting context. User wants ALL tasks done. Keep going. | | "User might want to see intermediate results" | You're assuming wrong. User will see results at the END. Continue. | | "Natural pause point" | You're making excuses. Only pause when ALL tasks complete or you're blocked. | | "Should summarize this step" | You're procrastinating. Summarize AFTER all tasks. Keep moving. |

Your pausing between tasks is procrastination disguised as courtesy. </EXTREMELY-IMPORTANT>

1. Prerequisites Check

Before spawning any teammates:

Verify PLAN.md exists with task list and Deps column annotations
Group tasks by independence:
- Tasks with Deps: — (no dependencies) can run in parallel
- Tasks with Deps: after N form dependency chains — keep these sequential within one teammate
- Group dependent chains into a single teammate assignment
Verify data scope separation:
- Each independent task/group should analyze different datasets OR different subsets
- If two independent tasks modify the same output files, merge them into one group (prevents conflicts)
- Example: Analysis by year (2020, 2021, 2022) = independent. Pipeline (clean → merge → model) = sequential.
Confirm at least 2 independent groups exist — otherwise fall back to sequential

Example grouping from a PLAN.md:

Task 0: Data cleaning (Deps: —)        → Run sequentially FIRST (foundation task)
Task 1: Descriptive stats by region (Deps: after 0) → Teammate A
Task 2: Descriptive stats by year (Deps: after 0)   → Teammate B (parallel with A)
Task 3: Logit model (Deps: after 0)    → Teammate C (parallel with A, B)
Task 4: Probit model (Deps: after 0)   → Teammate D (parallel with A, B, C)
Task 5: Comparison table (Deps: after 3, 4) → Run sequentially AFTER (needs all results)

Foundation tasks (like data cleaning or ETL) that everything depends on must complete BEFORE spawning parallel teammates. Run these sequentially first using normal ds-delegate loops.

Reconciliation tasks (like comparison tables) that need all parallel results must run AFTER teammates complete.

2. Create Shared Task List and Enter Delegate Mode

Run foundation tasks first (any task that all others depend on) using sequential ds-delegate
After foundation tasks complete, create one TaskCreate per independent task/group:
- Subject: Analyze: [Task Name(s)]
- Description: task details, data scope, expected outputs
Press Shift+Tab to enter delegate mode — the lead coordinates, does NOT analyze
Spawn one teammate per task/group

3. Spawn Prompt Template

Each teammate receives this self-contained prompt. Teammates start with a blank conversation and do NOT auto-load skills. The prompt must contain everything they need.

Before spawning, substitute these variables:

TASK_NAME → task name(s) from PLAN.md
TASK_DETAILS → full task text pasted from PLAN.md (not a file reference)
SPEC_CONTEXT → relevant section of SPEC.md pasted inline (objective, methodology requirements)
DATA_SCOPE → specific datasets/subsets this teammate may use (prevents conflicts)
OUTPUT_FILES → specific output files this teammate will create (prevents overwrites)
PREVIOUS_WORK → relevant entries from LEARNINGS.md (foundation task results)
PLUGIN_ROOT → resolved value of ${CLAUDE_PLUGIN_ROOT}

You are implementing one analysis task as part of a data science team. You have EXCLUSIVE
ownership of the data scope and output files listed below. Do not modify outputs outside your scope.

## Your Assignment

Task: {TASK_NAME}

### Task Details (from PLAN.md)
{TASK_DETAILS}

### Analysis Objective (from SPEC.md)
{SPEC_CONTEXT}

### Previous Work (from LEARNINGS.md)
{PREVIOUS_WORK}

## Data Scope (EXCLUSIVE — do not use data outside this scope)

{DATA_SCOPE}

If you discover you need data NOT in your scope, STOP and message the lead:
"Need access to [dataset/subset] which is outside my scope. Reason: [why]."

## Output Files (EXCLUSIVE — do not modify files outside this list)

{OUTPUT_FILES}

If you need to create an output file NOT in this list, STOP and message the lead.

## Iron Law of Output-First Verification (Non-Negotiable)

**EVERY CODE STEP MUST PRODUCE VISIBLE OUTPUT. This is not negotiable.**

Before moving to the next step, you MUST:
1. Run the code
2. See the output (print, display, plot)
3. Verify output is correct/reasonable
4. Document what you observed
5. Only THEN proceed to next step

**If you're about to write code without outputting results, STOP.**

### What Output-First Means

| DO | DON'T |
|-------|----------|
| Print shape after each transform | Chain operations silently |
| Display sample rows | Trust transformations work |
| Show summary stats | Wait until end to check |
| Verify row counts | Assume merges worked |
| Check for unexpected nulls | Skip intermediate checks |
| Plot distributions | Move on without looking |

### Rationalization Prevention

| Thought | Reality |
|---------|---------|
| "I'll check at the end" | STOP — you're letting errors compound silently. Check after every step. |
| "This transform is simple" | STOP — simple code can still be wrong. Output and verify. |
| "I know merge worked" | STOP — you've assumed this before and been wrong. Check row counts. |
| "Data looks fine" | STOP — you're confusing "looks" with verification. Print stats, show samples. |
| "I'll batch the outputs" | STOP — you're about to lose your ability to isolate issues. Output per operation. |

## Step 1: Load Analysis Protocol

Read("{PLUGIN_ROOT}/lib/skills/ds-delegate/SKILL.md")


This contains the detailed output-first protocol and verification patterns.

## Step 2: Implement with Output-First Protocol

For EVERY operation (load, filter, merge, transform, model):

1. **BEFORE:** Print state (shape, head, dtypes)
2. **EXECUTE:** Run operation
3. **AFTER:** Print state (shape, nulls, sample)
4. **VERIFY:** Check output is reasonable
5. **DOCUMENT:** Note what you observed

Example:
```python
print(f"Before merge: df1={df1.shape}, df2={df2.shape}")
df = df1.merge(df2, on='key', how='left')
print(f"After merge: df={df.shape}")
print(f"Nulls introduced: {df.isnull().sum().sum()}")
print(df.head())

Required Outputs by Operation

| Operation | Required Output | |-----------|-----------------| | Load data | shape, dtypes, head() | | Filter | shape before/after, % removed | | Merge/Join | shape, null check, sample | | Groupby | result shape, sample groups | | Model fit | metrics, convergence check |

Step 3: Save Outputs to Your Scope

Save all analysis outputs (plots, tables, model objects) to the files in OUTPUT FILES.

Use clear naming:

Plots: {OUTPUT_FILES[0]}/plot_distribution.png
Tables: {OUTPUT_FILES[0]}/table_summary.csv
Models: {OUTPUT_FILES[0]}/model_logit.pkl

Step 4: Message the Lead

After completing your analysis, send a message to the lead with:

Finished: {TASK_NAME}

Outputs created:
- [list each file with brief description]

Data quality observations:
- [any nulls, outliers, or data issues found]
- [or "No issues" if clean]

Methodology notes:
- [any assumptions made about statistical approach]
- [or "Standard approach per spec" if straightforward]

Key findings:
- [1-3 bullet points of main results]

The lead uses these messages to check for methodology inconsistencies between teammates. Do NOT message other teammates directly — the lead coordinates all cross-task communication.

Step 5: Self-Verification Checklist

Before marking your task complete, verify ALL of the following:

[ ] Every operation produced visible output
[ ] All outputs saved to files in OUTPUT FILES
[ ] Only data in DATA SCOPE was used
[ ] No silent data loss (row counts checked before/after merges)
[ ] No unexpected nulls introduced
[ ] Methodology matches SPEC.md requirements (re-read spec context above)
[ ] Key findings documented in message to lead

Only mark your task complete after all boxes pass.

If You Encounter Issues

Need data outside scope: Message lead, do NOT access it
Output looks wrong: STOP, investigate, document in message to lead
Blocked by missing data: Message lead: "Blocked — need [data/result] from foundation task"
Unclear methodology: Message lead with specific question. Do NOT guess.
Code error: Debug with output-first approach (print state at each step)


### 4. Lead Monitoring

While teammates analyze:

- **Watch the shared task list** for completion status and messages
- **If a teammate reports a methodology question:** Relay the answer to ALL affected teammates (e.g., "Use robust standard errors for all regressions")
- **If a teammate reports data quality issues:** Decide whether to halt parallel work and fix foundation task
- **If a teammate requests out-of-scope data:** Decide whether to expand scope or note as limitation
- **If a teammate has been working significantly longer than others:** Message them for status
- **Do NOT implement any analysis yourself** — your job is coordination and reconciliation

### 5. Reconciliation Protocol (3 Passes)

After ALL teammates mark their tasks complete, the lead performs three passes:

<EXTREMELY-IMPORTANT>
**Pass 1 — Collect & Conflicts:**

1. Read all output files created by teammates
2. Check for file conflicts:
   - Did teammates overwrite each other's outputs? (should not happen if scope separation worked)
   - Are output file names clear and non-overlapping?
3. Verify all expected outputs exist (cross-check against each teammate's completion message)
4. If outputs are missing or conflict, identify which teammate and request fix

**Pass 2 — Output Verification:**

1. For each teammate's outputs, verify:
   - Data shapes are reasonable (no unexpected empty DataFrames)
   - Summary statistics make sense (no all-zeros, no suspicious outliers)
   - Plots render correctly (no blank images)
   - Model convergence achieved (if applicable)
2. Cross-check outputs against SPEC.md requirements:
   - Did we get all the analyses requested?
   - Are output formats what spec required (tables vs plots)?
3. If outputs look wrong:
   - Identify specific issue (e.g., "Model failed to converge", "Plot shows no data")
   - Dispatch fix subagent using ds-delegate targeting the specific issue
   - Re-verify after fix

**Pass 3 — Methodology Consistency:**

1. Read each teammate's completion message (methodology notes section)
2. Check for methodology conflicts:
   - Did teammates make different assumptions about the same thing? (e.g., one used robust SE, another didn't)
   - Did teammates use different variable definitions? (e.g., one logged income, another didn't)
   - Did teammates handle nulls differently? (one dropped, another imputed)
3. Read SPEC.md methodology requirements — verify all teammates followed spec
4. If methodology conflicts found:
   - Document the conflict
   - Decide on canonical approach (from spec or user input)
   - Dispatch fix subagent to harmonize methodology
   - Re-run affected analyses

**If ANY pass fails → fix before proceeding. Do NOT skip reconciliation passes.**
</EXTREMELY-IMPORTANT>

### 6. When to Use Agent Teams

<EXTREMELY-IMPORTANT>
**DS work is MOSTLY sequential. Only use parallel mode for rare cases with true independence.**

**Use agent teams when:**
- 4+ tasks in PLAN.md with at least 2 independent groups
- Independent tasks analyze different datasets OR different subsets of same data
- Tasks create different output files (no overlap)
- Tasks are self-contained (each has own data scope and outputs)
- Examples:
  - ✅ Descriptive stats by subgroup (region A, region B, region C)
  - ✅ Model comparisons (logit, probit, random forest on same cleaned data)
  - ✅ Robustness checks (main spec, alternative spec 1, alternative spec 2)
  - ✅ Multiple visualizations (time series plot, scatter plot, correlation heatmap)

**Do NOT use agent teams when:**
- Tasks form a pipeline (clean → merge → transform → model) — this is SEQUENTIAL
- Each task depends on seeing the previous output to decide next step — this is EXPLORATORY
- Multiple tasks modify the same datasets or output files
- Fewer than 4 tasks (overhead exceeds benefit)
- Tasks require shared state that's built incrementally
- Exploratory analysis where you don't know what's needed until you see data
- `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS` is not available

**The default is sequential.** Parallel is the exception, not the rule.
</EXTREMELY-IMPORTANT>

### 7. After Reconciliation

After all three reconciliation passes complete successfully:

1. **Log consolidated results to LEARNINGS.md:**

```markdown
## Parallel Analysis Complete

**Tasks executed in parallel:**
- Task N: [Name] - Teammate A
- Task M: [Name] - Teammate B
- Task P: [Name] - Teammate C

**Outputs created:**
- [List all output files across teammates]

**Methodology:**
- [Document canonical approach used across analyses]

**Key findings:**
- [Consolidated findings from all teammates]

**Reconciliation:**
- Pass 1 (Collect): [Any conflicts? How resolved?]
- Pass 2 (Outputs): [Any issues? How fixed?]
- Pass 3 (Methodology): [Any conflicts? How harmonized?]

Proceed to next phase (either next sequential task or ds-review if all tasks complete)

Gate: Exit Implementation

<EXTREMELY-IMPORTANT> **You MUST NOT proceed to review without verifying ALL tasks are complete. This is not negotiable.**

Before invoking ds-review, execute this gate:

IDENTIFY: Read .claude/PLAN.md — list every task by number and name
RUN: Read .claude/LEARNINGS.md — find entries for each task
READ: For each task, confirm LEARNINGS.md contains:
- A "Task N: [Name] - COMPLETE" entry
- Verified output (shape, stats, or sample)
- No unresolved issues flagged
VERIFY: Count tasks in PLAN.md vs completed entries in LEARNINGS.md. They MUST match.
CLAIM: Only if all tasks accounted for, proceed to review

If ANY task is missing from LEARNINGS.md, implement it before proceeding.

Claiming all tasks are done without checking LEARNINGS.md against PLAN.md is LYING. </EXTREMELY-IMPORTANT>

Phase Complete

After passing the exit gate, IMMEDIATELY invoke:

Read("${CLAUDE_PLUGIN_ROOT}/lib/skills/ds-review/SKILL.md")

Agent Skills: Implementation (Output-First Verification)

Install this agent skill to your local

Skill Files