Smart Test Selection Skill
Purpose
Optimizes test execution by intelligently selecting which tests to run based on code changes. Instead of running the full test suite every time, this skill:
- Maps code changes to affected test files using import dependency analysis
- Provides tiered testing strategies for different feedback loop needs
- Tracks test reliability to prioritize stable tests in fast runs
When I Activate
I automatically load when you mention:
- "run affected tests" or "run impacted tests"
- "smart test" or "intelligent testing"
- "which tests to run" or "test selection"
- "fast tests" or "quick tests"
- "tests for changes" or "tests for this PR"
Core Concepts
Test Tiers
Tier 1: Fast Tests (< 1 minute)
- Directly affected unit tests (imports changed file)
- High-reliability tests only (no flaky tests)
- Run on every save or pre-commit
- Command:
pytest -m "not slow and not integration" [selected_tests]
Tier 2: Impacted Tests (< 5 minutes)
- All tests affected by changes (direct + transitive dependencies)
- Includes integration tests for changed modules
- Run before commit or on PR draft
- Command:
pytest [selected_tests]
Tier 3: Full Suite
- Complete test suite
- Run on PR ready-for-review or CI
- Command:
pytest
Import Dependency Analysis
The skill builds a dependency graph by analyzing Python imports:
source_file.py
|
+-- Imported by: module_a.py, module_b.py
| |
| +-- Tested by: test_module_a.py, test_module_b.py
|
+-- Tested by: test_source_file.py (direct test)
Direct Tests: Files matching pattern test_{module}.py or {module}_test.py
Indirect Tests: Tests that import modules which import the changed file
Reliability Tracking
Tests are scored on reliability (0.0 to 1.0):
- 1.0: Always passes (stable)
- 0.5-0.9: Occasional failures (investigate)
- < 0.5: Frequently fails (flaky - excluded from Tier 1)
Reliability is tracked in ~/.amplihack/.claude/data/test-mapping/reliability.yaml
Usage
Analyze Changes and Get Test Commands
User: What tests should I run for my changes?
Claude (using smart-test):
1. Analyzes git diff or staged changes
2. Maps changed files to test dependencies
3. Returns tiered test commands
Example Output:
------------------------------------------
Smart Test Analysis
------------------------------------------
Changed Files:
- src/amplihack/core/processor.py
- src/amplihack/utils/helpers.py
Tier 1 (Fast - 45s estimated):
pytest tests/unit/test_processor.py tests/unit/test_helpers.py -v
Tier 2 (Impacted - 3m estimated):
pytest tests/unit/test_processor.py tests/unit/test_helpers.py \
tests/integration/test_pipeline.py -v
Tier 3 (Full - 12m estimated):
pytest
Recommendation: Start with Tier 1 for quick feedback.
Build or Refresh Mapping Cache
User: Build the test mapping for this project
Claude:
1. Scans all Python files
2. Builds import dependency graph
3. Maps source files to test files
4. Saves to .claude/data/test-mapping/code_to_tests.yaml
Check Test Reliability
User: Show flaky tests
Claude:
1. Reads reliability.yaml
2. Lists tests with reliability < 0.8
3. Suggests investigation or quarantine
Process
Step 1: Identify Changed Files
# For staged changes
git diff --cached --name-only --diff-filter=ACMR
# For all uncommitted changes
git diff --name-only --diff-filter=ACMR
# For PR changes (vs main)
git diff main...HEAD --name-only --diff-filter=ACMR
Filter to only Python source files (exclude tests themselves for mapping).
Step 2: Build Import Graph
For each Python file, extract imports:
# Patterns to detect:
import module
from module import item
from package.module import item
from . import relative
from ..parent import item
Build bidirectional mapping:
- Forward: file -> what it imports
- Reverse: file -> what imports it
Step 3: Map to Tests
For each changed file, find tests via:
- Direct test match:
test_{filename}.pyor{filename}_test.py - Import-based: Tests that import the changed module
- Transitive: Tests that import modules that import changed module (1 level)
Step 4: Apply Reliability Filter
For Tier 1 only, exclude tests with reliability < 0.8.
Step 5: Generate Commands
Output pytest commands with appropriate markers:
# Tier 1
pytest -m "not slow and not integration" tests/a.py tests/b.py
# Tier 2
pytest tests/a.py tests/b.py tests/c.py
# Tier 3
pytest
Data Storage
code_to_tests.yaml
# .claude/data/test-mapping/code_to_tests.yaml
version: 1
last_updated: "2025-11-25T10:00:00Z"
mappings:
src/amplihack/core/processor.py:
direct_tests:
- tests/unit/test_processor.py
indirect_tests:
- tests/integration/test_pipeline.py
transitive_tests:
- tests/e2e/test_full_workflow.py
src/amplihack/utils/helpers.py:
direct_tests:
- tests/unit/test_helpers.py
indirect_tests:
- tests/unit/test_processor.py # processor imports helpers
reliability.yaml
# .claude/data/test-mapping/reliability.yaml
version: 1
last_updated: "2025-11-25T10:00:00Z"
tests:
tests/unit/test_processor.py::test_basic:
passes: 98
failures: 2
reliability: 0.98
last_failure: "2025-11-20"
tests/integration/test_api.py::test_timeout:
passes: 45
failures: 15
reliability: 0.75
last_failure: "2025-11-24"
flaky_reason: "Network dependent"
Integration with Workflow
This skill integrates with DEFAULT_WORKFLOW.md:
Step 12: Run Tests and Pre-commit Hooks
- Use Tier 1 (fast) for pre-commit
- Quick feedback on changed code
Step 13: Mandatory Local Testing
- Use Tier 2 (impacted) before commit
- Ensures affected code paths are tested
CI Pipeline
- Use Tier 2 on draft PRs
- Use Tier 3 (full) on ready-for-review PRs
Markers Integration
Works with existing pytest markers from pyproject.toml:
slow- Excluded from Tier 1integration- Excluded from Tier 1e2e- Excluded from Tier 1 and 2neo4j- Requires special environmentrequires_docker- Requires Docker daemon
Quick Reference
| Scenario | Tier | Time Budget | Command Pattern |
| ---------- | ---- | ----------- | --------------------------------- |
| Pre-commit | 1 | < 1 min | pytest -m "not slow" [affected] |
| Pre-push | 2 | < 5 min | pytest [affected + transitive] |
| Draft PR | 2 | < 5 min | pytest [affected + transitive] |
| Ready PR | 3 | Full | pytest |
| CI main | 3 | Full | pytest |
Philosophy Alignment
Ruthless Simplicity
- Simple tier system (1, 2, 3)
- YAML storage over database
- Import analysis over complex AST parsing
Zero-BS Implementation
- Real pytest commands (copy-paste ready)
- Actual time estimates based on test count
- No placeholder data or mock reliability scores
Testing Pyramid
- Tier 1 prioritizes unit tests (60%)
- Tier 2 adds integration tests (30%)
- Tier 3 includes E2E tests (10%)
Complementary Skills
- test-gap-analyzer: Identifies missing tests
- qa-team: Creates E2E and parity test scenarios (
outside-in-testingalias supported) - tester agent: Writes new tests for gaps
- pre-commit-diagnostic: Fixes pre-commit failures
Common Patterns
Pattern 1: Quick Iteration
[Developer makes small change]
Claude: Run affected tests (Tier 1)
[45 seconds later]
Claude: 3/3 tests passed. Ready for commit.
Pattern 2: Pre-Push Validation
[Developer about to push]
Claude: Run impacted tests (Tier 2)
[3 minutes later]
Claude: 12/12 tests passed including integrations.
Pattern 3: Flaky Test Investigation
User: Tests keep failing randomly
Claude: Checking reliability data...
Found 2 flaky tests (< 0.8 reliability):
- test_api_timeout (0.75) - Network dependent
- test_concurrent_write (0.68) - Race condition
Recommend: Quarantine these tests or fix root cause.
Limitations
- Python-only import analysis
- Single-level transitive analysis (deeper chains excluded)
- Reliability data requires initial seeding from test runs
- Does not detect dynamic imports or string-based imports
When to Avoid
Do NOT use smart-test when:
- First time setting up tests - No mapping cache exists yet; run full suite first
- Major refactoring - When module structure changes significantly, mappings become stale
- Configuration changes - Changes to
pytest.ini,conftest.py, or fixtures affect all tests - CI environment variables changed - Environment-dependent tests may all need re-running
- Database schema migrations - All database-touching tests should run
- Flaky test investigation - Run full suite to get accurate reliability data
- Pre-merge final check - Always run Tier 3 (full suite) before merging to main
Rule of thumb: When in doubt, run the full suite. Smart-test optimizes iteration speed, not correctness.
Error Handling and Troubleshooting
Common Issues
Issue: "No tests found for changed file"
Cause: File is new or not yet mapped
Fix: Rebuild the mapping cache
User: "Rebuild test mapping cache"
Issue: "Import analysis failed"
Cause: Syntax error in Python file or circular imports
Fix: 1. Check file for syntax errors: python -m py_compile file.py
2. Resolve circular imports
3. Rebuild mapping cache
Issue: "Reliability data missing"
Cause: No test runs have been recorded yet
Fix: Run full test suite once, then:
User: "Update test reliability with these results"
Issue: "Tier 1 tests taking too long"
Cause: Too many tests marked as "fast" or slow tests not marked
Fix: 1. Add @pytest.mark.slow to tests > 1 second
2. Add @pytest.mark.integration to integration tests
3. Review test granularity
Issue: "Cache is stale / wrong tests selected"
Cause: Module structure changed since last cache build
Fix: Delete cache and rebuild:
rm -rf .claude/data/test-mapping/*.yaml
User: "Rebuild test mapping cache"
Recovery Commands
# Verify test mapping is valid
python -c "import yaml; yaml.safe_load(open('.claude/data/test-mapping/code_to_tests.yaml'))"
# Check reliability data
python -c "import yaml; print(yaml.safe_load(open('.claude/data/test-mapping/reliability.yaml')))"
# Force full suite (bypass smart-test)
pytest --ignore-glob='**/test_slow_*'
# Find tests with no source mapping (orphaned tests)
find tests -name "test_*.py" -exec basename {} \; | sort > /tmp/tests.txt
Cache Maintenance
The mapping cache should be rebuilt when:
- New test files are added
- Module structure changes significantly
- Cache is older than 7 days
Trigger manually: "Rebuild test mapping cache"
Note: Start with Tier 1 for rapid feedback. If tests pass, you likely caught any regressions. Only escalate to higher tiers when approaching commit/push milestones.