Agent Skills: test-driven-development

Use when implementing features or fixing bugs - enforces RED-GREEN-REFACTOR cycle requiring tests to fail before writing code

UncategorizedID: withzombies/hyperpowers/test-driven-development

Skill Files

Browse the full folder contents for test-driven-development.

Download Skill

Loading file tree…

skills/test-driven-development/SKILL.md

Skill Metadata

Name
test-driven-development
Description
Use when implementing features or fixing bugs - enforces RED-GREEN-REFACTOR cycle requiring tests to fail before writing code

<skill_overview> Write the test first, watch it fail, write minimal code to pass. If you didn't watch the test fail, you don't know if it tests the right thing. </skill_overview>

<rigidity_level> LOW FREEDOM - Follow these exact steps in order. Do not adapt.

Violating the letter of the rules is violating the spirit of the rules. </rigidity_level>

<quick_reference>

| Phase | Action | Command Example | Expected Result | |-------|--------|-----------------|-----------------| | RED | Write failing test | cargo test test_name | FAIL (feature missing) | | Verify RED | Confirm correct failure | Check error message | "function not found" or assertion fails | | GREEN | Write minimal code | Implement feature | Test passes | | Verify GREEN | All tests pass | cargo test | All green, no warnings | | REFACTOR | Clean up code | Improve while green | Tests still pass |

Iron Law: NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST

</quick_reference>

<when_to_use> Always use for:

  • New features
  • Bug fixes
  • Refactoring with behavior changes
  • Any production code

Ask your human partner for exceptions:

  • Throwaway prototypes (will be deleted)
  • Generated code
  • Configuration files

Thinking "skip TDD just this once"? Stop. That's rationalization. </when_to_use>

<the_process>

1. RED - Write Failing Test

Write one minimal test showing what should happen.

Requirements:

  • Test one behavior only ("and" in name? Split it)
  • Clear name describing behavior
  • Use real code (no mocks unless unavoidable)

See resources/language-examples.md for Rust, Swift, TypeScript examples.

2. Verify RED - Watch It Fail

MANDATORY. Never skip.

Run the test and confirm:

  • ✓ Test fails (not errors with syntax issues)
  • ✓ Failure message is expected ("function not found" or assertion fails)
  • ✓ Fails because feature missing (not typos)

If test passes: You're testing existing behavior. Fix the test. If test errors: Fix syntax error, re-run until it fails correctly.

3. GREEN - Write Minimal Code

Write simplest code to pass the test. Nothing more.

Key principle: Don't add features the test doesn't require. Don't refactor other code. Don't "improve" beyond the test.

4. Verify GREEN - Watch It Pass

MANDATORY.

Run tests and confirm:

  • ✓ New test passes
  • ✓ All other tests still pass
  • ✓ No errors or warnings

If test fails: Fix code, not test. If other tests fail: Fix now before proceeding.

5. REFACTOR - Clean Up

Only after green:

  • Remove duplication
  • Improve names
  • Extract helpers

Keep tests green. Don't add behavior.

6. Repeat

Next failing test for next feature.

</the_process>

<examples> <example> <scenario>Developer writes implementation first, then adds test that passes immediately</scenario> <code> // Code written FIRST def validate_email(email): return "@" in email # Bug: accepts "@@"

// Test written AFTER def test_validate_email(): assert validate_email("user@example.com") # Passes immediately! // Missing edge case: assert not validate_email("@@") </code>

<why_it_fails> When test passes immediately:

  • Never proved the test catches bugs
  • Only tested happy path you remembered
  • Forgot edge cases (like "@@")
  • Bug ships to production

Tests written after verify remembered cases, not required behavior. </why_it_fails>

<correction> **TDD approach:**
  1. RED - Write test first (including edge case):
def test_validate_email():
    assert validate_email("user@example.com")  # Will fail - function doesn't exist
    assert not validate_email("@@")            # Edge case up front
  1. Verify RED - Run test, watch it fail:
NameError: function 'validate_email' is not defined
  1. GREEN - Implement to pass both cases:
def validate_email(email):
    return "@" in email and email.count("@") == 1
  1. Verify GREEN - Both assertions pass, bug prevented.

Result: Test failed first, proving it works. Edge case discovered during test writing, not in production. </correction> </example>

<example> <scenario>Developer has already written 3 hours of code without tests. Wants to keep it as "reference" while writing tests.</scenario> <code> // 200 lines of untested code exists // Developer thinks: "I'll keep this and write tests that match it" // Or: "I'll use it as reference to speed up TDD" </code>

<why_it_fails> Keeping code as "reference":

  • You'll copy it (that's testing after, with extra steps)
  • You'll adapt it (biased by implementation)
  • Tests will match code, not requirements
  • You'll justify shortcuts: "I already know this works"

Result: All the problems of test-after, none of the benefits of TDD. </why_it_fails>

<correction> **Delete it. Completely.**
git stash  # Or delete the file

Then start TDD:

  1. Write first failing test from requirements (not from code)
  2. Watch it fail
  3. Implement fresh (might be different from original, that's OK)
  4. Watch it pass

Why delete:

  • Sunk cost is already gone
  • 3 hours implementing ≠ 3 hours with TDD (TDD might be 2 hours total)
  • Code without tests is technical debt
  • Fresh implementation from tests is usually better

What you gain:

  • Tests that actually verify behavior
  • Confidence code works
  • Ability to refactor safely
  • No bugs from untested edge cases </correction>
</example> <example> <scenario>Test is hard to write. Developer thinks "design must be unclear, but I'll implement first to explore."</scenario> <code> // Test attempt: func testUserServiceCreatesAccount() { // Need to mock database, email service, payment gateway, logger... // This is getting complicated, maybe I should just implement first } </code>

<why_it_fails> "Test is hard" is valuable signal:

  • Hard to test = hard to use
  • Too many dependencies = coupling too tight
  • Complex setup = design needs simplification

Implementing first ignores this signal:

  • Build the complex design
  • Lock in the coupling
  • Now forced to write complex tests (or skip them) </why_it_fails>
<correction> **Listen to the test.**

Hard to test? Simplify the interface:

// Instead of:
class UserService {
    init(db: Database, email: EmailService, payments: PaymentGateway, logger: Logger) { }
    func createAccount(email: String, password: String, paymentToken: String) throws { }
}

// Make testable:
class UserService {
    func createAccount(request: CreateAccountRequest) -> Result<Account, Error> {
        // Dependencies injected through request or passed separately
    }
}

Test becomes simple:

func testCreatesAccountFromRequest() {
    let service = UserService()
    let request = CreateAccountRequest(email: "user@example.com")
    let result = service.createAccount(request: request)
    XCTAssertEqual(result.email, "user@example.com")
}

TDD forces good design. If test is hard, fix design before implementing. </correction> </example>

</examples>

<critical_rules>

Rules That Have No Exceptions

  1. Write code before test? → Delete it. Start over.

    • Never keep as "reference"
    • Never "adapt" while writing tests
    • Delete means delete
  2. Test passes immediately? → Not TDD. Fix the test or delete the code.

    • Passing immediately proves nothing
    • You're testing existing behavior, not required behavior
  3. Can't explain why test failed? → Fix until failure makes sense.

    • "function not found" = good (feature doesn't exist)
    • Weird error = bad (fix test, re-run)
  4. Want to skip "just this once"? → That's rationalization. Stop.

    • TDD is faster than debugging in production
    • "Too simple to test" = test takes 30 seconds
    • "Already manually tested" = not systematic, not repeatable

Common Excuses

All of these mean: Stop, follow TDD:

  • "This is different because..."
  • "I'm being pragmatic, not dogmatic"
  • "It's about spirit not ritual"
  • "Tests after achieve the same goals"
  • "Deleting X hours of work is wasteful"

</critical_rules>

<verification_checklist>

Before marking work complete:

  • [ ] Every new function/method has a test
  • [ ] Watched each test fail before implementing
  • [ ] Each test failed for expected reason (feature missing, not typo)
  • [ ] Wrote minimal code to pass each test
  • [ ] All tests pass with no warnings
  • [ ] Tests use real code (mocks only if unavoidable)
  • [ ] Edge cases and errors covered

Can't check all boxes? You skipped TDD. Start over.

</verification_checklist>

<integration>

This skill calls:

  • verification-before-completion (running tests to verify)

This skill is called by:

  • fixing-bugs (write failing test reproducing bug)
  • executing-plans (when implementing bd tasks)
  • refactoring-safely (keep tests green while refactoring)

Agents used:

  • hyperpowers:test-runner (run tests, return summary only)
</integration> <resources>

Detailed language-specific examples:

When stuck:

  • Test too complicated? → Design too complicated, simplify interface
  • Must mock everything? → Code too coupled, use dependency injection
  • Test setup huge? → Extract helpers, or simplify design
</resources>