Rust Test Writing Skill Skill

Rust Test Writing Skill

Write tests that catch real bugs. Every test must guard a specific invariant -- not just prove the code "works."

The "What Could Break?" Framework

Before writing any test, answer these four questions:

What invariant does this code maintain? (e.g., "deserialized config always has a default profile")
What edge case would violate it? (e.g., "empty TOML table, missing key, extra unknown key")
What platform difference could surface? (e.g., "path separators, case sensitivity, symlink behavior")
What would a future refactor accidentally break? (e.g., "field added to struct but not to Display impl")

If you can only answer #1, your test is a happy-path test. Answer all four and you have a regression suite.

The 5 Test Transformations

Each transformation shows a superficial test pattern and its expert replacement.

Transformation 1: Weak assertions -> whole-object comparison

// BEFORE: proves nothing about the rest of the struct
let result = parse_config(input)?;
assert!(result.is_ok());

// AFTER: catches any unexpected field change
use pretty_assertions::assert_eq;
let result = parse_config(input)?;
assert_eq!(result, Config {
    name: "default".into(),
    timeout: Duration::from_secs(30),
    retries: 3,
    verbose: false,
});

Transformation 2: Single happy-path -> targeted test suite

// BEFORE: one test, one path
#[test]
fn test_parse_config() {
    let cfg = parse("valid input").unwrap();
    assert!(cfg.is_valid());
}

// AFTER: 3-6 tests covering happy, error, edge, platform
#[test]
fn parse_config_returns_defaults_for_minimal_input() { .. }
#[test]
fn parse_config_rejects_negative_timeout() { .. }
#[test]
fn parse_config_preserves_unknown_fields_as_extensions() { .. }
#[test]
fn parse_config_handles_empty_string_gracefully() { .. }
#[cfg(windows)]
#[test]
fn parse_config_normalizes_backslash_paths() { .. }

Transformation 3: Inline test module -> sibling `_tests.rs` file

// BEFORE: tests pollute the production file diff
// foo.rs
pub fn compute() -> u32 { 42 }
#[cfg(test)]
mod tests {
    use super::*;
    #[test]
    fn it_works() { assert_eq!(compute(), 42); }
}

// AFTER: production code and test code in sibling files
// foo.rs
pub fn compute() -> u32 { 42 }
#[cfg(test)]
#[path = "foo_tests.rs"]
mod tests;

// foo_tests.rs
use super::*;
#[test]
fn compute_returns_expected_value() { assert_eq!(compute(), 42); }

For mod.rs modules, use mod_tests.rs.

Transformation 4: String-based test data -> typed struct construction

// BEFORE: silent breakage when fields change
let input: Config = serde_json::from_str(r#"{"name":"test","timeout":30}"#)?;

// AFTER: compile-time safety for field additions/renames
fn make_config(name: &str, timeout_secs: u64) -> Config {
    Config {
        name: name.to_string(),
        timeout: Duration::from_secs(timeout_secs),
        retries: 0,
        verbose: false,
    }
}
let input = make_config("test", 30);

Factory functions for domain objects let each test construct exactly the fixture it needs. No shared mutable state. No JSON parsing at test time.

Transformation 5: `HashMap` in fixtures -> `BTreeMap` for determinism

// BEFORE: test passes 99% of the time, flakes in CI
let mut map = HashMap::new();
map.insert("b", 2);
map.insert("a", 1);
assert_eq!(format!("{map:?}"), r#"{"a": 1, "b": 2}"#); // order not guaranteed

// AFTER: deterministic iteration order
let mut map = BTreeMap::new();
map.insert("b", 2);
map.insert("a", 1);
assert_eq!(format!("{map:?}"), r#"{"a": 1, "b": 2}"#); // always this order

Use BTreeMap whenever output order affects assertions or snapshots.

Test Flake Hunting Protocol

Bolin's single most frequent pattern (97+ references across 30+ commits). When a test is flaky, follow this exact protocol:

Identify the race window -- read the event-emission code, find where timing assumptions break. Locate the exact line where the test assumes an event has arrived or a state has changed without proof.
Replace timing with event-driven sync -- wait for a specific event instead of sleeping or assuming order. Never use sleep as a synchronization primitive.
Make assertions order-independent -- sort collected values, use sets, or match by content not position. Non-deterministic event ordering is not a bug; asserting on it is.
Stress-test the fix -- run with the exact command: cargo nextest run -p <crate> -j 2 --no-fail-fast --stress-count 50 --status-level leak
Document the non-determinism in the commit message -- explain why the timing assumption was wrong and what synchronization replaced it.

// BEFORE (timing-dependent):
tokio::time::sleep(Duration::from_millis(100)).await;
assert_eq!(events.len(), 2);
assert_eq!(events[0].type_name, "item.create");
assert_eq!(events[1].type_name, "audio.delta");

// AFTER (event-driven, order-independent):
wait_for_event(&rx, |e| e.type_name == "item.create").await;
wait_for_event(&rx, |e| e.type_name == "audio.delta").await;
// OR: collect, sort, compare
let mut types: Vec<_> = events.iter().map(|e| &e.type_name).collect();
types.sort();
assert_eq!(types, vec!["audio.delta", "item.create"]);

Common flake sources to watch for:

turn/started emitted optimistically before state is actually ready
Event ordering across async channels (mpsc, broadcast)
HashMap iteration order in serialized output
Test harness Drop racing with child process shutdown (close stdin first, then wait, then kill)

Intent-Based Assertions

Replace exact command-string matching with intent-based semantic matching. Check that the test observes the right INTENT (operation + target) rather than a specific command format that varies across platforms or refactors.

// BEFORE: brittle -- breaks if command formatting changes
assert_eq!(cmd.to_string(), "rm -rf /tmp/workspace/build");

// AFTER: intent-based -- asserts the operation and target
assert_eq!(cmd.operation(), Operation::Remove);
assert!(cmd.target().ends_with("workspace/build"));
assert!(cmd.is_recursive());

When exact strings are unavoidable, assert on the semantically meaningful parts (path suffix, flag presence) rather than the full formatted string.

Test Naming Convention

Pattern: {subject}_{scenario}_{expected_outcome}

A failed test name must be an actionable bug description. When it fails in CI, the name alone tells you what broke. The name should read as a specification: if it fails, you know exactly what invariant was violated.

Exemplary names:

sandbox_detection_requires_keywords
sandbox_detection_ignores_non_sandbox_mode
aggregate_output_rebalances_when_stderr_is_small
parse_config_rejects_negative_timeout
permissions_profiles_reject_writes_outside_workspace_root
permissions_profiles_allow_network_enablement
legacy_sandbox_mode_config_builds_split_policies_without_drift
under_development_features_are_disabled_by_default
usage_limit_reached_error_formats_free_plan
unexpected_status_cloudflare_html_is_simplified
root_write_plus_carveouts_still_requires_platform_sandbox
explicit_unreadable_paths_prevent_auto_approval_for_external_sandbox
denied_hosts_take_priority_over_allowed_hosts_glob

Anti-pattern names: test_parse, it_works, test_config_1, happy_path.

Testing Stack Quick Reference

| Scenario | Tool | |----------|------| | HTTP mocking | wiremock::MockServer | | Filesystem isolation | TempDir (tempfile crate) | | Async tests | #[tokio::test] | | UI / output snapshots | insta::assert_snapshot! | | Struct comparison | pretty_assertions::assert_eq | | Enum variant checks | assert_matches! | | Deterministic collections | BTreeMap over HashMap | | Flake stress-testing | cargo nextest run --stress-count 50 |

When to use each

wiremock: Any test that would hit a real HTTP endpoint. Mount responses with Mock::given().respond_with(). Assert request bodies after the test.
TempDir: Every test that touches disk. Never mutate the process environment. Never hardcode /tmp or C:\.
insta: TUI widgets, CLI output, error messages -- anything where the exact text matters. Render to a buffer, snapshot with assert_snapshot!.
pretty_assertions: Default for all assert_eq! calls. Gives colored diffs on failure. Import at the top of every test file.
nextest stress: After fixing any flaky test. Always run with -j 2 --no-fail-fast --stress-count 50 to confirm the fix holds under concurrency.

Self-Review Checklist

After writing tests, verify every item. Fix every violation before presenting the tests.

[ ] Uses pretty_assertions::assert_eq (not std assert_eq)
[ ] Compares entire objects, not individual fields
[ ] Each test guards a specific invariant (not just "it works")
[ ] Test names follow {subject}_{scenario}_{expected_outcome}
[ ] Test names encode the invariant being guarded
[ ] TempDir for any filesystem tests (no hardcoded paths)
[ ] No process environment mutation (no std::env::set_var)
[ ] Error paths tested (not just happy path)
[ ] At least 3 tests for any non-trivial function
[ ] BTreeMap used where iteration order affects assertions
[ ] Test file is a sibling _tests.rs, not inline mod tests {}
[ ] No timing-dependent assertions (no sleep -> assert)
[ ] Order-independent where event order is non-deterministic
[ ] String assertions use intent matching, not exact format

Agent Skills: Rust Test Writing Skill

Install this agent skill to your local

Skill Files