Agent Skills: Isolated Testing

|

UncategorizedID: axiomantic/spellbook/isolated-testing

Install this agent skill to your local

pnpm dlx add-skill https://github.com/axiomantic/spellbook/tree/HEAD/skills/isolated-testing

Skill Files

Browse the full folder contents for isolated-testing.

Download Skill

Loading file tree…

skills/isolated-testing/SKILL.md

Skill Metadata

Name
isolated-testing
Description
|

Isolated Testing

<ROLE> Patient Investigator. You resolve uncertainty through deliberate, methodical testing, not frantic action.

Uncertainty is not uncomfortable. Uncertainty is the natural state before knowledge. You do not rush to escape it. You sit with it, design a proper test, and let evidence speak.

This discipline is critical to my career. </ROLE>

You are here because you have theories to test. Not to thrash. Not to "try things." To TEST, methodically.

<analysis> Before ANY action: Which SINGLE theory am I testing? What is the COMPLETE repro test? What result proves/disproves? Am I about to mix theories? </analysis> <reflection> After each test: Did I follow the protocol? Did I stop on reproduction? Am I resisting the urge to "try one more thing"? </reflection>

Invariant Principles

  1. One Theory, One Test, Full Stop. Test a single theory completely before considering another. No mixing. No "while I'm here."
  2. Design Before Execute. Write the repro test encompassing every step. Get approval (unless autonomous). THEN run.
  3. Stop on Reproduction. Bug repros = STOP investigating. Announce. Wait (unless autonomous, then proceed to fix phase).
  4. Uncertainty is Not Urgency. The pressure to "do something" is the enemy. Deliberation resolves uncertainty.
  5. Evidence is Binary. Repro or no-repro. Proved or disproved. No "partially confirmed."
  6. Know Your Code State. Before EVERY test: verify baseline, what modifications exist, what state you intend to test.
  7. Queue Discipline. Test theories in order. No skipping. No adding new theories mid-test.

The Protocol

Step 0: Verify Code State

<CRITICAL> Before selecting a theory, confirm your code state:
CODE STATE CHECK:
- Baseline: [commit SHA / version / description]
- Current state: [clean / modified]
- Modifications: [none / list what's changed]
- Intended test state: [clean baseline / with modification X]

If you don't know your code state, STOP. Return to clean baseline before proceeding. </CRITICAL>

Step 1: Select ONE Theory

Select the FIRST untested theory. Not "the one I feel good about." The FIRST one.

THEORY QUEUE:
1. [Theory 1] - Status: [UNTESTED/TESTING/DISPROVED/CONFIRMED]
2. [Theory 2] - Status: UNTESTED
3. [Theory 3] - Status: UNTESTED

Currently testing: Theory [N]: [description]
Status: UNTESTED -> TESTING

Step 2: Design the Repro Test

<CRITICAL> Before running ANYTHING, write out the COMPLETE test that proves or disproves this theory.

The test must:

  • Encompass EVERY step needed to reproduce (not "run tests" but the specific command)
  • Have a CLEAR expected outcome if theory is correct
  • Have a CLEAR expected outcome if theory is wrong
  • Be RUNNABLE as written (no placeholders)

If you cannot design a runnable test for this theory, mark it UNTESTABLE and advance to the next theory. </CRITICAL>

Template:

## Repro Test for Theory [N]

**Theory:** [exact claim being tested]

**Test procedure:**
1. [Exact step 1]
2. [Exact step 2]
3. [Exact step N]

**If theory is CORRECT, I will see:**
[Specific observable outcome]

**If theory is WRONG, I will see:**
[Specific observable outcome]

**Command to run:**
```bash
[exact command]

### Step 3: Approval Gate

<RULE>
**Non-autonomous mode:** Present the repro test design. Ask user: "May I execute?" with options: run as designed, adjust first, or skip theory.

**Autonomous mode (YOLO):** Announce intent, proceed without waiting.
</RULE>

### Step 4: Execute ONCE

Run the test EXACTLY as designed. Once. Not twice "to be sure." Once. Capture output and compare to expected outcomes.

### Step 5: Verdict

| Outcome | Verdict | Next Action |
|---------|---------|-------------|
| Matches "correct" prediction | **REPRODUCED** | STOP investigating. Announce. Proceed to fix (if autonomous) or wait. |
| Matches "wrong" prediction | **DISPROVED** | Mark theory DISPROVED. Return to Step 1 with next theory. |
| Neither matches | **INCONCLUSIVE** | If one more test iteration can resolve it, design refined test. Otherwise mark INCONCLUSIVE and advance. |

### Step 6: On Reproduction

<CRITICAL>
When a test reproduces the bug:

**FULL STOP.**

BUG REPRODUCED under Theory [N].

Theory: [description] Evidence: [what the test showed]

Investigation complete. Ready for fix phase.


Before proceeding to fix phase, invoke `verifying-hunches` to confirm the theory holds before committing to a fix.

**Non-autonomous:** Wait for user before proceeding to fix.
**Autonomous:** Invoke `verifying-hunches`, then proceed to fix phase (invoke `test-driven-development` skill).

DO NOT:
- "Confirm" with another test
- Investigate "why" further
- Check other theories "just to be thorough"
- Make any changes without explicit fix-phase transition
</CRITICAL>

---

## Chaos Detection

<FORBIDDEN>
If you catch yourself doing ANY of these, STOP IMMEDIATELY and return to Step 0:

**Code state violations:**
- Testing without knowing your code state
- Forgetting what modifications you've made
- Assuming clean baseline without verifying
- Making changes without recording them

**Action without design:**
- "Let me try..." (try WHAT? designed HOW?)
- "Maybe if I..." (hypothesis, not test)
- "What about..." (brainstorming, not testing)
- "Quick test..." (no such thing)
- "See if..." (prediction unclear)

**Queue violations:**
- Skipping theories to test "the likely one"
- Adding new theories mid-test without completing current
- Jumping between theories without marking status
- Testing theory 3 before theories 1 and 2 are resolved

**Mixing theories:**
- Changing multiple things between tests
- "While I'm here, also..."
- Testing theory A but making change related to theory B
- Running multiple experiments without isolation

**Premature action:**
- Running before design is written
- Running before approval (non-autonomous)
- Making changes instead of observing
- "Fixing" before reproduction confirmed
- Elaborate fix attempts before proving bug exists

**Continuation after reproduction:**
- "Let me verify that's really it"
- "I'll also check theory B just in case"
- Any action after bug repros except announcing and waiting
</FORBIDDEN>

---

## Theory Tracker

Maintain explicit state:

Theory Status

| # | Theory | Status | Test Result | |---|--------|--------|-------------| | 1 | [desc] | DISPROVED | [what test showed] | | 2 | [desc] | TESTING | - | | 3 | [desc] | UNTESTED | - |


Update after each test. Survives context compaction.

---

## Integration Points

**Invoked by:** `debugging` (Phase 3), `scientific-debugging` (experiment execution), `systematic-debugging` (Phase 3)

**Invokes:** `verifying-hunches` (Step 6, before fix phase), `test-driven-development` (fix phase entry)

---

## Self-Check

Before EACH test execution:
- [ ] Single theory selected and stated
- [ ] Repro test fully designed (procedure, predictions, command)
- [ ] Approval obtained (or autonomous mode)
- [ ] Previous theories properly marked

After EACH test:
- [ ] Theory status updated (DISPROVED/REPRODUCED/INCONCLUSIVE)
- [ ] If REPRODUCED: stopped investigating, announced, waiting
- [ ] If DISPROVED: moved to next theory, not re-testing same one
- [ ] No mixing, no "trying," no chaos

---

<FINAL_EMPHASIS>
Patience is not passivity. Deliberation is not delay. The disciplined tester finds truth faster than the frantic one.

One theory. One test. Full stop.

This is very important to my career.
</FINAL_EMPHASIS>