Test-driven development (XP-style)
Tests define the specification. Design them from requirements before any implementation. The RED-GREEN-REFACTOR cycle is the heartbeat: write a failing test, make it pass with minimal code, then clean up while green.
Modern insight (2025): TDD + property-based testing pairing is the standard -- example tests prevent regressions, property tests discover edge cases. TDD also serves AI-assisted development: structural integrity keeps code understandable for both human and AI collaborators (Kent Beck, "Augmented Coding"). Mutation testing validates test quality beyond coverage metrics (TDD+Mutation: 63.3% vs TDD-alone: 39.4% mutation coverage).
See frameworks for language-specific test runners, property testing, coverage, and mutation tools. See examples for brief TDD cycle patterns per language.
When to Apply
- New features with clear requirements (both inside-out and outside-in approaches valid)
- Bug fixes -- write a failing test that proves the bug before fixing
- Refactoring -- ensure coverage exists before restructuring
- API contract enforcement -- test the interface, not internals
- Property-based invariants -- complement example tests with PBT
- Legacy code -- add characterization tests before modifying (Michael Feathers pattern)
When NOT to Apply
- Exploratory prototyping or spike research
- One-off scripts, data migrations, generated code
- Purely visual UI layout work (prefer visual regression testing)
- Highly experimental algorithmic research (but PBT still helps)
- Throwaway code with <1 week lifespan
Anti-patterns
- Test-last: Writing tests after implementation defeats the design benefit
- Testing implementation details: Tests should verify behavior, not internal structure -- breaks refactoring confidence
- Over-mocking: Testing the mocks instead of the code; mock external I/O, not core logic
- Skipping RED: Tests that never fail aren't tests -- they verify nothing
- 100% coverage obsession: Coverage does not equal quality. Mutation testing exposes gaps coverage cannot
- Refactoring on RED: Never restructure with failing tests
- Test-induced architectural damage: Letting mock boundaries dictate design
- Snapshot bloat: Approval-style tests without curation become maintenance burden
Two Schools (decision guidance, not prescription)
- Inside-Out (Classic/Detroit): Start with unit tests for smallest pieces, build upward. Minimizes mocks. Best for well-understood domains, algorithms, utility functions.
- Outside-In (London/Mockist): Start with acceptance test for user-facing behavior, use mocks to discover interfaces. Best for layered systems, APIs, microservices.
- Pragmatic teams use both depending on context. Neither is superior.
Test Doubles Hierarchy
- Stubs: Return predefined data; verify outcomes (state-based)
- Mocks: Verify interactions/calls were made (behavior-based)
- Fakes: Working implementations (e.g., in-memory database)
- Spies: Record calls while using real behavior
- Rule: Mock external dependencies. Never mock core domain logic.
Workflow (language-neutral)
- CREATE -- Write failing tests: error cases -> edge cases -> happy paths -> property tests
- RED -- Run tests, verify all fail. If any pass, the test is wrong or behavior already exists.
- GREEN -- Minimal code to pass. No extras, no optimization, no cleanup.
- REFACTOR -- Clean up while green. Separate structural changes from behavioral (Tidy First). Re-run tests after every change.
Constitutional Rules (Non-Negotiable)
- Design Tests First: Plan all test cases from requirements before implementation; write each test iteratively in the RED-GREEN-REFACTOR loop
- RED Before GREEN: Each new test MUST fail before you write implementation for it
- Error Cases First: Implement error handling before success paths
- One Test at a Time: Write one failing test, make it pass, refactor, then add the next test
- Refactor Only on GREEN: Never refactor with failing tests
Validation Gates
| Gate | Pass Criteria | Blocking | |------|---------------|----------| | Tests Created | Test files exist for target module | Yes | | RED State | All new tests fail before implementation | Yes | | GREEN State | All tests pass after implementation | Yes | | Coverage | >= 80% line coverage | No | | Mutation | Mutation score reviewed (no threshold enforced) | No |
Exit Codes
| Code | Meaning | |------|---------| | 0 | TDD cycle complete, all tests pass | | 11 | No test framework detected | | 12 | Test compilation failed | | 13 | Tests not failing (RED state invalid) | | 14 | Tests fail after implementation (GREEN not achieved) | | 15 | Tests fail after refactor (regression) |