TDD Workflow Skill | Agent Skills

TDD Workflow

Strict test-driven development with Result types, dependency injection, and state machine governance.

The Iron Law

NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST

Write code before the test? Delete it. Start over.

No exceptions:

Don't keep it as "reference"
Don't "adapt" it while writing tests
Don't look at it
Delete means delete

Implement fresh from tests. Period.

Rationale: Code written before tests is biased by implementation. TDD requires tests to drive design, not verify existing code.

CRITICAL: STATE MACHINE GOVERNANCE

EVERY SINGLE MESSAGE MUST START WITH YOUR CURRENT TDD STATE

Format:

⚪ TDD: PLANNING
🔴 TDD: RED
🟢 TDD: GREEN
🔵 TDD: REFACTOR
🟡 TDD: VERIFY
⚠️ TDD: BLOCKED
🔥 TDD: VIOLATION

NOT JUST THE FIRST MESSAGE. EVERY. SINGLE. MESSAGE.

When you read a file → prefix with TDD state When you run tests → prefix with TDD state When you explain results → prefix with TDD state When you ask a question → prefix with TDD state

Example:

⚪ TDD: PLANNING
Writing test for getUser returning NOT_FOUND...

⚪ TDD: PLANNING
Running npm test to see it fail...

🔴 TDD: RED
Test fails correctly. Implementing minimum solution...

🟢 TDD: GREEN
Test passes. Checking if refactor needed...

State Machine Diagram

                  user request
                       ↓
                 ┌──────────┐
            ┌────│ PLANNING │────┐
            │    └─────┬────┘    │
            │          │         │
            │  test fails        │
            │  correctly         │
  unclear   │          ↓         │ blocker
            │    ┌──────────┐    │
            └────│   RED    │    │
                 │          │    │
                 │ Test IS  │    │
                 │ failing  │    │
                 └────┬─────┘    │
                      │          │
              test    │          │
              passes  │          │
                      ↓          │
                 ┌──────────┐    │
                 │  GREEN   │    │
                 │          │    │
                 │ Test IS  │    │
                 │ passing  │    │
                 └────┬─────┘────┘
                      │
          refactoring │
          needed      │
                      ↓
                 ┌──────────┐
            ┌────│ REFACTOR │
            │    │          │
            │    │ Improve  │
            │    │ design   │
            │    └────┬─────┘
            │         │
            │    done │
            │         │
            │         ↓
            │    ┌──────────┐
            │    │  VERIFY  │
            │    │          │
            │    │ Run full │
  fail      │    │ suite +  │
            │    │ lint +   │
            └────│ build    │
                 └────┬─────┘
                      │
                 pass │
                      │
                      ↓
                 [COMPLETE]

State: PLANNING

Prefix: ⚪ TDD: PLANNING

Purpose: Write a failing test that proves the requirement.

Pre-Conditions

User has provided a task/requirement/bug report
No other TDD cycle in progress

The Delete Rule

If production code exists before the test:

Delete it immediately
Do not keep as "reference"
Do not "adapt" it
Start fresh from the test

Rationale: Code written before tests is biased by implementation. TDD requires tests to drive design, not verify existing code.

Actions

Analyze requirement - what behavior needs testing?
Identify edge cases
Write test using fn(args, deps) pattern
Use vitest-mock-extended: const deps = mock<GetUserDeps>()
Run test (use Bash tool)
VERIFY test fails correctly (MANDATORY - see Verify RED below)
Show exact failure message verbatim
Justify why failure proves test is correct

Verify RED - Watch It Fail

MANDATORY. Never skip.

npm test path/to/test.test.ts

Confirm:

Test fails (not errors)
Failure message is expected
Fails because feature missing (not typos)

Test passes? You're testing existing behavior. Fix test. Test errors? Fix error, re-run until it fails correctly.

Post-Conditions (ALL required before transition)

[ ] Test written with proper deps mocking
[ ] Test executed
[ ] Test FAILED correctly (not setup error) - VERIFIED by watching it fail
[ ] Failure message shown verbatim
[ ] Failure is "meaningful" (assertion failure, not import error)

Validation Before Transition

Pre-transition validation:
✓ Test written: [yes/no]
✓ Test executed: [yes/no]
✓ Test failed correctly: [yes - output above]
✓ Meaningful failure: [yes - justification]

Transitioning to RED.

Transitions

PLANNING → RED (test fails correctly)
PLANNING → BLOCKED (cannot write valid test)

State: RED

Prefix: 🔴 TDD: RED

Purpose: Test IS failing. Implement ONLY what the error message demands.

Pre-Conditions

Test written and executed
Test IS FAILING correctly
Failure is meaningful (not setup/syntax error)

The Hardcode Rule

Before implementing, announce:

Minimal implementation check:
- Error demands: [what the error literally says]
- Could hardcoded value work? [yes/no]
- If yes: [what hardcoded value]
- If no: [why real logic is required]

| Error Says | Do This | |-----------|---------| | expected err('NOT_FOUND') | return err('NOT_FOUND') | | expected ok(user) | return ok(mockUser) | | expected count: 0 | return { count: 0 } |

Only add logic when tests FORCE you to.

Actions

Read error message - what does it literally ask for?
Announce minimal implementation check
Implement ONLY what error demands (hardcode if possible)
Run test
VERIFY test PASSES (MANDATORY - see Verify GREEN below)
Run typecheck (npm run typecheck)
Run lint (npm run lint)
Show all output verbatim
Transition to GREEN

Verify GREEN - Watch It Pass

MANDATORY. Never skip.

npm test path/to/test.test.ts

Confirm:

Test passes
Other tests still pass
Output pristine (no errors, warnings)

Test fails? Fix code, not test. Other tests fail? Fix now.

Post-Conditions (ALL required)

[ ] Implemented ONLY what error demanded
[ ] Test executed and PASSES
[ ] Code compiles (no TypeScript errors)
[ ] Code lints (no ESLint errors)
[ ] All output shown verbatim

Validation Before Transition

Post-condition validation:
✓ Test PASSES: [yes - output above]
✓ Code compiles: [yes - output above]
✓ Code lints: [yes - output above]
✓ Implementation minimal: [justification]

Transitioning to GREEN.

Critical Rules

NEVER transition to GREEN without test PASS + compile + lint
NEVER anticipate future errors - address THIS error only
NEVER change test to match implementation - fix the code

Transitions

RED → GREEN (test passes, code compiles, code lints)
RED → BLOCKED (cannot make test pass)
RED → PLANNING (test failure reveals misunderstood requirement)

State: GREEN

Prefix: 🟢 TDD: GREEN

Purpose: Test IS passing. Assess code quality and decide next step.

Pre-Conditions

Test exists and PASSES
Code compiles and lints
Implementation is minimal

Actions

Review implementation against patterns:

| Pattern | Check | |---------|-------| | fn(args, deps) | Is deps type explicit and minimal? | | Result types | Are all error cases typed? | | Validation | Is Zod at boundary only? | | Naming | Domain-specific, not generic? |

Decide: Does code need refactoring?
If YES → REFACTOR
If NO → VERIFY

Post-Conditions

[ ] Test IS PASSING
[ ] Code quality assessed
[ ] Decision made: refactor or verify

Transitions

GREEN → REFACTOR (improvements needed)
GREEN → VERIFY (code is clean)
GREEN → RED (test starts failing - regression)

State: REFACTOR

Prefix: 🔵 TDD: REFACTOR

Purpose: Tests ARE passing. Improve design while keeping green.

Pre-Conditions

Tests ARE PASSING
Refactoring needs identified

Refactor Checklist

// 1. Extract deps type
// BEFORE
async function getUser(
  args: { userId: string },
  deps: { db: Database; logger: Logger }
)

// AFTER
type GetUserDeps = { db: Database; logger: Logger };
async function getUser(args: { userId: string }, deps: GetUserDeps)

// 2. Type all errors
// BEFORE
return err('error')

// AFTER
type GetUserError = 'NOT_FOUND' | 'DB_ERROR';
return err<GetUserError>('NOT_FOUND')

// 3. Use domain names
// BEFORE
const data = await deps.db.get(id);

// AFTER
const user = await deps.db.findUser(userId);

Actions

Apply ONE refactoring
Run tests - verify still pass
Repeat until no more improvements
Transition to VERIFY

Post-Conditions

[ ] Deps type is explicit and exported
[ ] All error types are explicit in signature
[ ] Names are domain-specific
[ ] Tests still pass after each refactor

When to Skip REFACTOR

Code is already clean
Single hardcoded value (generalize when second test forces it)

Never skip when:

Deps type missing or inline
Error types not explicit
Generic names (data, util, result)

Transitions

REFACTOR → VERIFY (code quality satisfactory)
REFACTOR → RED (refactor broke test)
REFACTOR → BLOCKED (cannot refactor due to constraints)

State: VERIFY

Prefix: 🟡 TDD: VERIFY

Purpose: Run full test suite + lint + build before claiming complete.

Actions

Run full test suite: npm test
Run lint: npm run lint
Run typecheck: npm run typecheck
Run build: npm run build (if applicable)
Show ALL output verbatim
If ALL pass → COMPLETE
If ANY fail → route appropriately

Post-Conditions (ALL required)

[ ] Full test suite executed - ALL pass
[ ] Lint executed - passes
[ ] Typecheck executed - passes
[ ] Build executed - succeeds
[ ] All output shown

Validation Before Completion

Final validation:
✓ Full test suite: [X/X tests passed - output shown]
✓ Lint: [passed - output shown]
✓ Typecheck: [passed - output shown]
✓ Build: [succeeded - output shown]

TDD cycle COMPLETE.

Summary:
- Tests written: [count]
- Refactorings: [count]

Transitions

VERIFY → COMPLETE (all checks pass)
VERIFY → RED (tests fail - regression)
VERIFY → REFACTOR (lint fails - code quality)
VERIFY → BLOCKED (build fails - structural issue)

State: BLOCKED

Prefix: ⚠️ TDD: BLOCKED

Purpose: Handle situations where progress cannot continue.

Actions

Clearly explain blocking issue
Explain which state you were in
Explain what you were trying to do
Suggest possible resolutions
STOP and wait for user guidance

Critical Rules

NEVER improvise workarounds
NEVER skip steps to "unblock" yourself
ALWAYS stop and wait for user

Transitions

BLOCKED → [any state] (based on user guidance)

State: VIOLATION

Prefix: 🔥 TDD: VIOLATION

Purpose: Handle state machine violations.

Triggers

Forgot state announcement
Skipped state
Failed to validate post-conditions
Claimed complete without evidence
Changed test assertion to match implementation
Implemented full solution when hardcode would work

Actions

IMMEDIATELY announce: 🔥 TDD: VIOLATION
Explain which rule was violated
Explain what you did wrong
Announce correct current state
Ask user permission to recover

Example

🔥 TDD: VIOLATION

Violation: Implemented full user lookup logic when hardcoded
value would satisfy the single test case.

Correct action: Return hardcoded { id: '123', name: 'Alice' }

Recovering to RED to fix implementation...

Test Structure for fn(args, deps)

import { describe, it, expect } from 'vitest';
import { mock } from 'vitest-mock-extended';
import { getUser, type GetUserDeps } from './get-user';

describe('getUser', () => {
  it('returns err NOT_FOUND when user does not exist', async () => {
    // Arrange: mock deps
    const deps = mock<GetUserDeps>();
    deps.db.findUser.mockResolvedValue(null);

    // Act
    const result = await getUser({ userId: '123' }, deps);

    // Assert Result type
    expect(result.ok).toBe(false);
    if (!result.ok) {
      expect(result.error).toBe('NOT_FOUND');
    }
  });

  it('returns ok with user when found', async () => {
    const mockUser = { id: '123', name: 'Alice' };
    const deps = mock<GetUserDeps>();
    deps.db.findUser.mockResolvedValue(mockUser);

    const result = await getUser({ userId: '123' }, deps);

    expect(result.ok).toBe(true);
    if (result.ok) {
      expect(result.value).toEqual(mockUser);
    }
  });
});

Meaningful vs Setup Failures

| Type | Example | Action | |------|---------|--------| | Meaningful | "expected err('NOT_FOUND') but got ok(user)" | Proceed to RED | | Meaningful | "expected false, received true" | Proceed to RED | | Setup | "Cannot find module './get-user'" | Fix import, stay in PLANNING | | Setup | "Property 'findUser' does not exist" | Fix mock, stay in PLANNING | | Setup | "TypeError: deps.db is undefined" | Fix setup, stay in PLANNING |

Red Flags - STOP and Start Over

If you encounter any of these, delete code and start over with TDD:

Code before test
Test after implementation
Test passes immediately (didn't watch it fail)
Can't explain why test failed
Tests added "later"
Rationalizing "just this once"
"I already manually tested it"
"Keep as reference" or "adapt existing code"
"Already spent X hours, deleting is wasteful"
"TDD is dogmatic, I'm being pragmatic"
"This is different because..."

All of these mean: Delete code. Start over with TDD.

Common Rationalizations

| Excuse | Reality | |--------|--------| | "Too simple to test" | Simple code breaks. Test takes 30 seconds. | | "I'll test after" | Tests passing immediately prove nothing. | | "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" | | "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. | | "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. | | "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. | | "Need to explore first" | Fine. Throw away exploration, start with TDD. | | "Test hard = design unclear" | Listen to test. Hard to test = hard to use. | | "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. | | "Manual test faster" | Manual doesn't prove edge cases. You'll re-test every change. |

Thinking "skip TDD just this once"? Stop. That's rationalization.

Critical Rules Summary

| Rule | Enforcement | |------|-------------| | Iron Law | NO production code without failing test first | | Delete code rule | If code exists before test, delete it | | State announcement | EVERY message starts with state | | Verify RED | MANDATORY - watch test fail | | Verify GREEN | MANDATORY - watch test pass | | No green without proof | Must see test pass output | | Error message driven | Implement ONLY what error demands | | Hardcode first | One test? Return expected value | | Never change test | Fix implementation, not assertion | | Validate before transition | Check all post-conditions |

Never Use vi.mock() for App Logic

// WRONG: vi.mock creates brittle path coupling
vi.mock('../infra/database', () => ({ db: mockDb }));

// CORRECT: Inject deps, mock with vitest-mock-extended
const deps = mock<GetUserDeps>();
deps.db.findUser.mockResolvedValue(mockUser);

Quick Reference

| State | Prefix | Action | Exit Condition | |-------|--------|--------|----------------| | PLANNING | ⚪ | Write failing test | Test fails meaningfully | | RED | 🔴 | Minimum implementation | Test passes + compile + lint | | GREEN | 🟢 | Assess quality | Decision: refactor or verify | | REFACTOR | 🔵 | Improve design | Tests still pass, code clean | | VERIFY | 🟡 | Full suite + lint + build | All green | | BLOCKED | ⚠️ | Explain blocker, wait | User guidance | | VIOLATION | 🔥 | Acknowledge, recover | Permission to continue |

Agent Skills: TDD Workflow

Install this agent skill to your local

Skill Files

TDD Workflow

The Iron Law

CRITICAL: STATE MACHINE GOVERNANCE

State Machine Diagram

State: PLANNING

Pre-Conditions

The Delete Rule

Actions

Verify RED - Watch It Fail

Post-Conditions (ALL required before transition)

Validation Before Transition

Transitions

State: RED

Pre-Conditions

The Hardcode Rule

Actions

Verify GREEN - Watch It Pass

Post-Conditions (ALL required)

Validation Before Transition

Critical Rules

Transitions

State: GREEN

Pre-Conditions

Actions

Post-Conditions

Transitions

State: REFACTOR

Pre-Conditions

Refactor Checklist

Actions

Post-Conditions

When to Skip REFACTOR

Transitions

State: VERIFY

Actions

Post-Conditions (ALL required)

Validation Before Completion

Transitions

State: BLOCKED

Actions

Critical Rules

Transitions

State: VIOLATION

Triggers

Actions

Example

Test Structure for fn(args, deps)

Meaningful vs Setup Failures

Red Flags - STOP and Start Over

Common Rationalizations

Critical Rules Summary

Never Use vi.mock() for App Logic

Quick Reference