Agent Skills: Test Improvement

Review, refactor, and improve test quality. Use when user says "improve tests", "refactor tests", "test coverage", "combine tests", "table-driven", "parametrize", "test.each", "eliminate test waste", or wants to optimize test structure.

UncategorizedID: alexei-led/claude-code-config/improving-tests

Install this agent skill to your local

pnpm dlx add-skill https://github.com/alexei-led/claude-code-config/tree/HEAD/src/skills/improving-tests

Skill Files

Browse the full folder contents for improving-tests.

Download Skill

Loading file tree…

src/skills/improving-tests/SKILL.md

Skill Metadata

Name
improving-tests

Test Improvement

Improve tests by making them behavioral, lean, and useful.

Role-gated action

Detect your capability from your tools, not from prose:

  • Write-capable role (engineer): apply the test changes and run the verification command.
  • Read-only role (reviewer): identify the weak/missing/brittle tests and emit the changes in the Proposed Changes contract under Verify and report. Apply nothing; run nothing — a reviewer has no edit or Bash tools, so the review mode is its natural fit.

Language detection and references

Detect the language from the file extensions in scope and load the matching reference for language-specific test patterns and tooling:

Mixed languages: load each matching reference. Unknown language: use the generic principles below only.

Modes

  • review → identify weak, duplicate, brittle, or missing tests
  • refactor → combine to table-driven/parametrized/test.each, remove waste
  • coverage → add tests for uncovered business behavior
  • tdd → red-green-refactor loop for a feature or bug
  • full → review + refactor + coverage

Testing principles

  • Test behavior through public interfaces, not implementation details.
  • The module interface is the test surface.
  • Mock only system boundaries: external APIs, network, time, randomness, filesystem, subprocesses.
  • Do not mock your own internal collaborators just to make tests easy.
  • Prefer integration-style tests when they give a clear, stable signal.
  • Delete old shallow tests once deeper interface tests cover the behavior.
  • No pointless tests for getters, constructors, default props, or generated glue.

TDD workflow

Use this for test-first or red-green-refactor work.

  1. Confirm the public interface and first behavior.
  2. Write one failing test for one behavior.
  3. Run it and watch it fail for the expected reason.
  4. Implement the smallest code that passes.
  5. Run the narrow test.
  6. Repeat one vertical slice at a time.
  7. Refactor only when green.

Do not write all tests first. Bulk RED creates imagined tests coupled to guessed implementation.

Review workflow

Explore the existing test suite. Engineer runs these commands to gather coverage; reviewer (no Bash) works from the test files in scope plus any coverage output the caller supplies — ask for that context if missing, do not run the commands:

# Go
go test -coverprofile=/tmp/cc-cov.out ./... && go tool cover -func=/tmp/cc-cov.out

# Python
pytest --cov=. --cov-report=term-missing

# TypeScript
bun test --coverage

Look for:

  • tests coupled to private helpers or call counts
  • tests that should be table-driven / parametrized / test.each
  • duplicate scenarios
  • weak mocks hiding real behavior
  • missing success, error, and edge cases on business logic
  • no usable seam for testing real behavior

Preferred consolidation patterns

For refactoring brittle private-helper tests, state the public behavior surface first. Example: create_user(payload) is the primary test surface; _normalize_user_payload() is not. Replace duplicate helper tests and internal call-count assertions with behavior checks through the public API. Mock only system boundaries. Delete shallow duplicates once the public behavior tests cover them.

  • Go — table-driven with t.Run(tc.name, ...)
  • Python@pytest.mark.parametrize with pytest.param()
  • TypeScriptit.each([{ input, expected, name }])

Extract helpers only after 3+ repetitions and only when the helper improves readability.

Verify and report

Engineer runs and names the relevant verification command for the project after applying. Reviewer names the command in the Proposed Changes rationale and does not run it (no Bash). Examples:

go test ./...
pytest -v
bun test

For Python, mention pytest or the project-specific equivalent explicitly. For refactor plans in Python projects, include pytest -v or the repository's configured uv run pytest command by name instead of only saying "run tests." For other stacks, name the equivalent test command instead of saying only "tests passed."

Engineer (applied the changes):

TEST IMPROVEMENT COMPLETE
=========================
Mode: review | refactor | coverage | tdd | full
Tests changed: N
Waste removed: N
Coverage: before → after (if measured)

Key improvements:
- file:line — change

Verification:
- <command> — pass/fail

Reviewer (identified only — emit the changes as a proposal, apply nothing):

## Proposed Changes

### Change 1: <brief description>

File: `path/to/test_file`
Action: CREATE | MODIFY | DELETE

Code:
<complete test code, in the file's language>

Rationale: <weak/missing/brittle test this addresses>

If no tests or framework exist, report that and ask before creating a new testing stack.