Rust Test Writing Skill
Write tests that catch real bugs. Every test must guard a specific invariant -- not just prove the code "works."
The "What Could Break?" Framework
Before writing any test, answer these four questions:
- What invariant does this code maintain? (e.g., "deserialized config always has a default profile")
- What edge case would violate it? (e.g., "empty TOML table, missing key, extra unknown key")
- What platform difference could surface? (e.g., "path separators, case sensitivity, symlink behavior")
- What would a future refactor accidentally break? (e.g., "field added to struct but not to Display impl")
If you can only answer #1, your test is a happy-path test. Answer all four and you have a regression suite.
The 5 Test Transformations
Each transformation shows a superficial test pattern and its expert replacement.
Transformation 1: Weak assertions -> whole-object comparison
// BEFORE: proves nothing about the rest of the struct
let result = parse_config(input)?;
assert!(result.is_ok());
// AFTER: catches any unexpected field change
use pretty_assertions::assert_eq;
let result = parse_config(input)?;
assert_eq!(result, Config {
name: "default".into(),
timeout: Duration::from_secs(30),
retries: 3,
verbose: false,
});
Transformation 2: Single happy-path -> targeted test suite
// BEFORE: one test, one path
#[test]
fn test_parse_config() {
let cfg = parse("valid input").unwrap();
assert!(cfg.is_valid());
}
// AFTER: 3-6 tests covering happy, error, edge, platform
#[test]
fn parse_config_returns_defaults_for_minimal_input() { .. }
#[test]
fn parse_config_rejects_negative_timeout() { .. }
#[test]
fn parse_config_preserves_unknown_fields_as_extensions() { .. }
#[test]
fn parse_config_handles_empty_string_gracefully() { .. }
#[cfg(windows)]
#[test]
fn parse_config_normalizes_backslash_paths() { .. }
Transformation 3: Inline test module -> sibling _tests.rs file
// BEFORE: tests pollute the production file diff
// foo.rs
pub fn compute() -> u32 { 42 }
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn it_works() { assert_eq!(compute(), 42); }
}
// AFTER: production code and test code in sibling files
// foo.rs
pub fn compute() -> u32 { 42 }
#[cfg(test)]
#[path = "foo_tests.rs"]
mod tests;
// foo_tests.rs
use super::*;
#[test]
fn compute_returns_expected_value() { assert_eq!(compute(), 42); }
For mod.rs modules, use mod_tests.rs.
Transformation 4: String-based test data -> typed struct construction
// BEFORE: silent breakage when fields change
let input: Config = serde_json::from_str(r#"{"name":"test","timeout":30}"#)?;
// AFTER: compile-time safety for field additions/renames
fn make_config(name: &str, timeout_secs: u64) -> Config {
Config {
name: name.to_string(),
timeout: Duration::from_secs(timeout_secs),
retries: 0,
verbose: false,
}
}
let input = make_config("test", 30);
Factory functions for domain objects let each test construct exactly the fixture it needs. No shared mutable state. No JSON parsing at test time.
Transformation 5: HashMap in fixtures -> BTreeMap for determinism
// BEFORE: test passes 99% of the time, flakes in CI
let mut map = HashMap::new();
map.insert("b", 2);
map.insert("a", 1);
assert_eq!(format!("{map:?}"), r#"{"a": 1, "b": 2}"#); // order not guaranteed
// AFTER: deterministic iteration order
let mut map = BTreeMap::new();
map.insert("b", 2);
map.insert("a", 1);
assert_eq!(format!("{map:?}"), r#"{"a": 1, "b": 2}"#); // always this order
Use BTreeMap whenever output order affects assertions or snapshots.
Test Flake Hunting Protocol
Bolin's single most frequent pattern (97+ references across 30+ commits). When a test is flaky, follow this exact protocol:
- Identify the race window -- read the event-emission code, find where timing assumptions break. Locate the exact line where the test assumes an event has arrived or a state has changed without proof.
- Replace timing with event-driven sync -- wait for a specific event instead of sleeping or assuming order. Never use
sleepas a synchronization primitive. - Make assertions order-independent -- sort collected values, use sets, or match by content not position. Non-deterministic event ordering is not a bug; asserting on it is.
- Stress-test the fix -- run with the exact command:
cargo nextest run -p <crate> -j 2 --no-fail-fast --stress-count 50 --status-level leak - Document the non-determinism in the commit message -- explain why the timing assumption was wrong and what synchronization replaced it.
// BEFORE (timing-dependent):
tokio::time::sleep(Duration::from_millis(100)).await;
assert_eq!(events.len(), 2);
assert_eq!(events[0].type_name, "item.create");
assert_eq!(events[1].type_name, "audio.delta");
// AFTER (event-driven, order-independent):
wait_for_event(&rx, |e| e.type_name == "item.create").await;
wait_for_event(&rx, |e| e.type_name == "audio.delta").await;
// OR: collect, sort, compare
let mut types: Vec<_> = events.iter().map(|e| &e.type_name).collect();
types.sort();
assert_eq!(types, vec!["audio.delta", "item.create"]);
Common flake sources to watch for:
turn/startedemitted optimistically before state is actually ready- Event ordering across async channels (mpsc, broadcast)
- HashMap iteration order in serialized output
- Test harness Drop racing with child process shutdown (close stdin first, then wait, then kill)
Intent-Based Assertions
Replace exact command-string matching with intent-based semantic matching. Check that the test observes the right INTENT (operation + target) rather than a specific command format that varies across platforms or refactors.
// BEFORE: brittle -- breaks if command formatting changes
assert_eq!(cmd.to_string(), "rm -rf /tmp/workspace/build");
// AFTER: intent-based -- asserts the operation and target
assert_eq!(cmd.operation(), Operation::Remove);
assert!(cmd.target().ends_with("workspace/build"));
assert!(cmd.is_recursive());
When exact strings are unavoidable, assert on the semantically meaningful parts (path suffix, flag presence) rather than the full formatted string.
Test Naming Convention
Pattern: {subject}_{scenario}_{expected_outcome}
A failed test name must be an actionable bug description. When it fails in CI, the name alone tells you what broke. The name should read as a specification: if it fails, you know exactly what invariant was violated.
Exemplary names:
sandbox_detection_requires_keywords
sandbox_detection_ignores_non_sandbox_mode
aggregate_output_rebalances_when_stderr_is_small
parse_config_rejects_negative_timeout
permissions_profiles_reject_writes_outside_workspace_root
permissions_profiles_allow_network_enablement
legacy_sandbox_mode_config_builds_split_policies_without_drift
under_development_features_are_disabled_by_default
usage_limit_reached_error_formats_free_plan
unexpected_status_cloudflare_html_is_simplified
root_write_plus_carveouts_still_requires_platform_sandbox
explicit_unreadable_paths_prevent_auto_approval_for_external_sandbox
denied_hosts_take_priority_over_allowed_hosts_glob
Anti-pattern names: test_parse, it_works, test_config_1, happy_path.
Testing Stack Quick Reference
| Scenario | Tool |
|----------|------|
| HTTP mocking | wiremock::MockServer |
| Filesystem isolation | TempDir (tempfile crate) |
| Async tests | #[tokio::test] |
| UI / output snapshots | insta::assert_snapshot! |
| Struct comparison | pretty_assertions::assert_eq |
| Enum variant checks | assert_matches! |
| Deterministic collections | BTreeMap over HashMap |
| Flake stress-testing | cargo nextest run --stress-count 50 |
When to use each
- wiremock: Any test that would hit a real HTTP endpoint. Mount responses with
Mock::given().respond_with(). Assert request bodies after the test. - TempDir: Every test that touches disk. Never mutate the process environment. Never hardcode
/tmporC:\. - insta: TUI widgets, CLI output, error messages -- anything where the exact text matters. Render to a buffer, snapshot with
assert_snapshot!. - pretty_assertions: Default for all
assert_eq!calls. Gives colored diffs on failure. Import at the top of every test file. - nextest stress: After fixing any flaky test. Always run with
-j 2 --no-fail-fast --stress-count 50to confirm the fix holds under concurrency.
Self-Review Checklist
After writing tests, verify every item. Fix every violation before presenting the tests.
[ ] Uses pretty_assertions::assert_eq (not std assert_eq)
[ ] Compares entire objects, not individual fields
[ ] Each test guards a specific invariant (not just "it works")
[ ] Test names follow {subject}_{scenario}_{expected_outcome}
[ ] Test names encode the invariant being guarded
[ ] TempDir for any filesystem tests (no hardcoded paths)
[ ] No process environment mutation (no std::env::set_var)
[ ] Error paths tested (not just happy path)
[ ] At least 3 tests for any non-trivial function
[ ] BTreeMap used where iteration order affects assertions
[ ] Test file is a sibling _tests.rs, not inline mod tests {}
[ ] No timing-dependent assertions (no sleep -> assert)
[ ] Order-independent where event order is non-deterministic
[ ] String assertions use intent matching, not exact format