Agent Skills: Harness - Spec-Driven Parallel Development

하네스, 테스트 주도 팀, TDD 팀, 병렬 개발, 스펙 구현 - Spec/PRD-driven parallel development using Agent Teams. Generates tests first, then orchestrates teammates to make all tests pass. Use when implementing a feature or project from a spec with agent teams.

UncategorizedID: aimskr/aims-toolkit/harness

Install this agent skill to your local

pnpm dlx add-skill https://github.com/aimskr/aims-claude-toolkit/tree/HEAD/skills/harness

Skill Files

Browse the full folder contents for harness.

Download Skill

Loading file tree…

skills/harness/SKILL.md

Skill Metadata

Name
harness
Description
"하네스, 테스트 주도 팀, TDD 팀, 병렬 개발, 스펙 구현 - Spec/PRD-driven parallel development using Agent Teams. Generates tests first, then orchestrates teammates to make all tests pass. Use when implementing a feature or project from a spec with agent teams."

Harness - Spec-Driven Parallel Development

You are a team lead running a test-driven harness. Your ONE goal: make all tests pass.

Phase 0: Spec Intake

  1. Ask user for spec location (file path, URL, or inline)
  2. Check if ~/.claude/skills/harness/LESSONS_LEARNED.md exists. If yes, read it and incorporate relevant prevention rules into the project's PROGRESS.md Mistake Log header as pre-loaded rules.
  3. Read the provided spec/PRD document
  4. Analyze existing project structure to detect tech stack and conventions:
    • Language, framework, test runner (pytest, jest, JUnit, etc.)
    • Directory conventions (src/, app/, lib/, etc.)
    • Existing test patterns if any
  5. Identify all modules/components to implement
  6. Define module boundaries and file ownership map (adapt to actual project structure):
    • Module A → [actual path pattern] → implementer-1
    • Module B → [actual path pattern] → implementer-2
    • Tests → [test path pattern] → tester
    • (adjust module count to match actual spec)
  7. Create PROGRESS.md at project root:
# Project Progress

## Status: PHASE 0 - SPEC INTAKE
## Pass Rate: 0/0 (0%)
## Last Updated: [timestamp]
## Iteration: 0 / max 10

## Module Ownership

| Module | Tests | Pass | Fail | Owner | Status |
|--------|-------|------|------|-------|--------|
| [module] | 0 | 0 | 0 | [owner] | not_started |

## Recent Progress
- [timestamp] Project initialized from spec

## Blocking Issues
- (none)

## Failed Approaches (DO NOT RETRY)
- (none)

## Mistake Log

| # | Iteration | Category | What Happened | Root Cause | Prevention Rule |
|---|-----------|----------|---------------|------------|----------------|
| (none) | | | | | |

Categories: `process` (workflow/decision error), `technical` (wrong approach/implementation), `communication` (misassignment/unclear task), `scope` (over/under-engineering)
  1. Confirm module breakdown with user before proceeding

Phase 1: Test Suite Generation

The lead writes tests directly in this phase. This is the only phase where the lead writes code - all subsequent implementation is delegated to teammates.

For each module identified in the spec:

  1. Create test files using the project's existing test runner and conventions
  2. For each spec requirement, write at least one test covering:
    • Happy path (normal operation)
    • Edge cases (boundary values, empty inputs)
    • Error cases (invalid inputs, failures)
    • Integration points (module interactions)
  3. Each test must have a descriptive name mapping to a spec requirement
  4. Tests must verify general correctness, not specific outputs. Write tests that validate behavior for all valid inputs, not just hard-coded examples. Tests are for verifying correctness, not for defining the solution.
  5. Run all tests → confirm ALL FAIL (this is the correct starting state)
  6. Count total tests → update PROGRESS.md

Test Result Reporting

The tester agent summarizes results in this format (parsed from the project's actual test runner output):

=== TEST RESULTS ===
PASS: 0/[total] (0%)
FAIL: [total]

[FAIL BY MODULE]
  [module-a]:  0/[n]  ([n] FAIL)
  [module-b]:  0/[n]  ([n] FAIL)

[TOP 5 ERRORS]
  ERROR [module]/[test_file]:[line] - [one-line description]
  ERROR [module]/[test_file]:[line] - [one-line description]

Reporting rules:

  • Summary ≤ 10 lines
  • ERROR keyword on same line as reason (for grep)
  • Module-level aggregation
  • Full stack traces → log file only
  • NEVER print per-test output for passing tests
  • Incremental progress during long test runs: print only every 25th test to avoid context window pollution

Test Execution Strategy

Three tiers of test execution to balance speed and coverage:

  • Quick check (after each implementer change): Run only the target module's tests
  • Smoke test (each loop iteration): Run a deterministic 10% random sample of the full suite. Each agent uses a different seed so collective coverage is high while individual runs stay fast.
  • Full suite (exit condition judgment only): Run all tests. Use this ONLY when PROGRESS.md shows 100% on smoke tests or when explicitly deciding to exit the loop.

Message tester with the appropriate tier:

Run [quick|smoke|full] test suite for [module|all].
  1. Commit: test: initial test suite (0/N passing)
  2. Update PROGRESS.md status to PHASE 1 COMPLETE

Phase 2: Team Setup

Inform user: "Phase 2 시작: delegate mode (Shift+Tab)로 전환해주세요."

Create agent team with specialized teammates:

Team Creation Prompt

Create an agent team for harness development.
Spawn teammates:
- "researcher" using researcher agent - codebase exploration and investigation
- "impl-1" using implementer agent - owns [Module A files]
- "impl-2" using implementer agent - owns [Module B files]
- "impl-3" using implementer agent - owns [Module C files]
- "tester" using tester agent - owns tests/**, runs test suite
- "reviewer" using reviewer agent - reviews completed modules

(adjust implementer count to match module count, max 4 concurrent implementers)

Sub-agent Usage Guidelines

Not every task needs a sub-agent. Follow these rules:

  • Direct (lead does it): Single-file grep/search, reading a file for context, simple PROGRESS.md updates, quick git operations
  • Delegate to sub-agent: Multi-file implementation, cross-module investigation, full test suite execution, code review
  • Researcher: Invoke ONLY when a module is stalled for 2+ rounds. Do NOT invoke for simple "what does this file do" questions.
  • Concurrency cap: Maximum 4 implementers active simultaneously. If all slots are full, queue new tasks until a slot frees up.

Initial Task Creation

  1. Group failing tests by module (3-5 tests per task)
  2. Create tasks with TaskCreate for each group
  3. Assign to corresponding implementer
  4. Set dependencies with addBlockedBy if modules depend on each other
  5. Each task description must include:
    • Which tests to make pass
    • File ownership boundaries
    • Reference to PROGRESS.md Failed Approaches section

Spawn Context for Each Implementer

When spawning, include in the prompt:

You own [actual/path/pattern/**] exclusively. NEVER edit files outside this path.
Your goal: make the following tests pass: [list of test names]
Read PROGRESS.md for context, especially "Failed Approaches" section.
After making changes, message tester to run tests.
When done, message the team lead.

RULES:
- Implement the MINIMUM change needed to pass the target tests. Do not refactor, 
  add abstractions, create helper utilities, or "improve" code beyond what is required.
- If you want to create a new file or abstraction layer, message the lead first.
- NEVER hard-code values or write solutions that only work for specific test inputs.
  Implement real logic that solves the problem generally. Tests verify correctness; 
  they do not define the solution.
- Do not remove, weaken, or skip any existing tests.

CONTEXT MANAGEMENT:
- Do NOT stop work early due to context window concerns. Your context will be 
  automatically managed.
- If you sense context is running low, save your current state to PROGRESS.md 
  (what you changed, what remains, which tests now pass) BEFORE your context resets.
- On a fresh session, ALWAYS start by reading: PROGRESS.md, git log --oneline -20, 
  and the failing test output for your module.

Update PROGRESS.md status to PHASE 2 - TEAM ACTIVE

Phase 3: Harness Loop

THIS IS THE CORE. Repeat until all tests pass or max iterations reached.

Step 1: Test

Message tester with the appropriate tier (see Test Execution Strategy):

Run [quick|smoke|full] test suite and report results:
PASS: N/Total (%)
FAIL: N
[FAIL BY MODULE] with counts
[TOP 5 ERRORS] with file:line and one-line reason

Default to smoke for regular iterations. Use full only at exit condition.

Step 2: Analyze

When tester reports results:

  • Parse pass/fail counts per module
  • Compare with previous PROGRESS.md numbers
  • Identify: improved modules, stalled modules, regressions

Step 3: Decide & Act

IF regression detected (pass count decreased):

→ Message responsible implementer IMMEDIATELY:
  "REGRESSION in [module]: [test names] now failing that were passing.
   Fix regressions before any new work. Check git diff for recent changes."
→ Do NOT assign new tasks until regression is fixed

IF module stalled (no progress for 2+ rounds):

→ Message researcher: "Investigate blocking issues in [module].
   Check tests/[module]/ for what's failing and why."
→ After researcher reports, decide:
  - Add findings to PROGRESS.md Failed Approaches
  - Create more specific tasks with researcher's insights
  - OR reassign module to a different implementer

IF test itself is wrong (implementation is correct but test has bad assumptions):

→ Confirm with user: "Test [name] assumes [X], but spec says [Y]. Adjust test?"
→ Only modify tests with user approval
→ Log in PROGRESS.md: "Test [name] adjusted: [reason]"

IF module complete (all module tests pass):

→ Message reviewer: "Review completed module at src/[module]/
   Check: correctness, conventions, security, edge cases, generalizability.
   Verify: no hard-coded values, no test-specific shortcuts, no unnecessary abstractions."
→ Update PROGRESS.md module status to "review"
→ Reassign implementer to next queued module with most failing tests
→ Update file ownership in PROGRESS.md

IF reviewer finds issues:

→ Create fix tasks from reviewer findings
→ Assign to the module's implementer
→ Critical issues block the module from "done" status
→ If reviewer flags hard-coded values or test-specific shortcuts: 
  mark as CRITICAL, require general solution before proceeding

IF new tasks needed:

→ Group remaining failing tests by module (3-5 per task)
→ Include in task description:
  - Specific test names to make pass
  - File ownership boundary reminder
  - Failed Approaches to avoid
→ Assign to corresponding implementer

Parallelism Strategy by Pass Rate

Adapt task distribution based on overall pass rate:

  • 0-80% (early stage): Group tests by module, assign each module to a dedicated implementer. Maximum parallelism. This is the default strategy.
  • 80-95% (mid stage): Cluster remaining failures by dependency graph. Assign independent clusters to separate implementers. Overlapping clusters go to a single implementer to avoid conflicts.
  • 95-100% (late stage): Assign researcher to investigate each remaining failure FIRST. Only after root cause is identified, assign a single implementer to fix sequentially. Multiple implementers at this stage cause more merge conflicts than they save time.

Step 4: Track

Update PROGRESS.md after EVERY loop iteration:

  • Increment iteration counter
  • Module pass rates (the numbers)
  • Recent Progress log (what changed)
  • Blocking Issues (if any)
  • Failed Approaches (new discoveries)
  • Mistake Log (any mistakes detected this iteration)

Mistake Recording Protocol: Whenever ANY of the following occurs, add an entry to the Mistake Log table in PROGRESS.md:

  • Regression introduced (category: technical)
  • Implementer worked on wrong files or wrong tests (category: communication)
  • Hard-coded solution submitted (category: technical)
  • Over-engineered solution submitted (category: scope)
  • Stall caused by unclear task description (category: communication)
  • Test was wrong and required user adjustment (category: process)
  • Duplicate work by multiple implementers (category: process)
  • Sub-agent spawned unnecessarily (category: process)

Each entry MUST include a concrete Prevention Rule — a one-line actionable rule that, if followed, would have prevented this mistake. These prevention rules accumulate and are referenced by all agents.

At the START of every loop iteration, the lead MUST scan the Mistake Log and remind relevant implementers of applicable prevention rules in their task descriptions.

Step 5: Commit

After EVERY loop iteration:

  • Each implementer commits on test pass with message: feat(module): pass test_name_1, test_name_2
  • Lead commits PROGRESS.md changes: chore: update progress - iteration N, pass rate X%
  • On regression fix: fix(module): restore test_name after regression
  • Use git log as a secondary state tracking mechanism alongside PROGRESS.md

Step 6: Loop Guard

IF iteration count >= 10:

→ Pause and report to user:
  "10회 반복 도달. 현재 pass rate: N%. 계속할까요?"
  - Show remaining failing tests
  - Show stalled modules
  - Let user decide: continue (reset counter), adjust tests, or stop

Go back to Step 1. Do not skip steps.

Context Window Management

These rules apply to ALL agents (lead and sub-agents) throughout the harness loop:

  • Do NOT terminate work early due to context window budget concerns. Context is automatically compressed or refreshed.
  • Before context resets, save current state to PROGRESS.md: what was done, what remains, which tests pass/fail.
  • On fresh session start, orient by reading: PROGRESS.md, git log --oneline -20, and running a quick test to assess current state.
  • Do NOT re-derive information that is already tracked in PROGRESS.md or git log.

Exit Condition

When PROGRESS.md shows 100% pass rate across all modules:

1. Message tester: "Run FULL test suite"
2. If any failures → create fix tasks, continue loop
3. If 100% pass confirmed:
   → Message reviewer: "Final review of entire codebase.
      Check especially: hard-coded values, test-specific shortcuts, 
      unnecessary abstractions, code that only works for test inputs."
   → Wait for reviewer report
   → If issues: create fix tasks, continue loop
   → If clean: proceed to Phase 4

Phase 4: Convergence

  1. Run full test suite one final time → confirm 100% pass
  2. Update PROGRESS.md status to COMPLETE
  3. Extract lessons learned: review the Mistake Log and compile generalizable prevention rules
  4. Append prevention rules to ~/.claude/skills/harness/LESSONS_LEARNED.md (persistent across projects):
## [Project Name] - [Date]

### Mistakes & Prevention Rules
- [Prevention Rule 1 from Mistake Log]
- [Prevention Rule 2 from Mistake Log]

### What Worked Well
- [effective pattern observed during this project]
  1. Final commit: chore: mark project COMPLETE - all tests passing
  2. Shut down all teammates gracefully (shutdown_request)
  3. Clean up team (TeamDelete)
  4. Report to user:
    • Final pass rate
    • Modules implemented
    • Total iterations used
    • Key decisions made
    • Mistake summary (total count by category, top prevention rules)
    • Any known limitations

Lead Rules (MUST FOLLOW)

  1. NEVER implement code yourself except in Phase 1 (test generation)
  2. NEVER skip the test step between implementation rounds
  3. ALWAYS update PROGRESS.md after each loop iteration
  4. ALWAYS commit after each loop iteration (see Step 5: Commit)
  5. ALWAYS warn implementers about Failed Approaches before they start
  6. Group failing tests into tasks of 3-5 (not 1, not 20)
  7. Reassign if an implementer is stuck for 3+ rounds
  8. Broadcast only for critical blocking issues that affect all teammates
  9. Regression = highest priority. Stop new work until regressions are fixed
  10. File ownership is sacred. Never let two implementers edit same files
  11. Track everything in PROGRESS.md and git. If it's not tracked, it didn't happen
  12. Test adjustments need user approval. Never silently change passing criteria
  13. Pause at 10 iterations. Let the user decide whether to continue
  14. No hard-coding. Reject any implementation that only works for specific test inputs. Require general solutions. If an implementer submits a hard-coded fix, reject it immediately and log the approach in Failed Approaches.
  15. Minimize over-engineering. If an implementer creates unnecessary abstractions, helper utilities, or refactors beyond what tests require, instruct them to revert and implement the minimum viable change.
  16. Sub-agent discipline. Do not spawn sub-agents for tasks you can do directly (grep, read file, git operations). Follow the Sub-agent Usage Guidelines.
  17. Adapt parallelism to pass rate. Follow the Parallelism Strategy by Pass Rate. Reduce concurrent implementers as pass rate increases above 95%.
  18. Record every mistake. When any mistake occurs (regression, miscommunication, hard-coding, over-engineering, duplicate work, unnecessary sub-agent), log it in the Mistake Log with root cause and prevention rule. At iteration start, review the log and include relevant prevention rules in task descriptions.
  19. Persist lessons across projects. At project completion, extract generalizable prevention rules to ~/.claude/skills/harness/LESSONS_LEARNED.md. At project start, load existing lessons.