Domain Error Strategy Skill

Domain Error Strategy

Layer 2: Design Choices

Core Question

Who needs to handle this error, and how should they recover?

Before designing error types:

Is this user-facing or internal?
Is recovery possible?
What context is needed for debugging?

Error Categorization

| Error Type | Audience | Recovery | Example | |------------|----------|----------|---------| | User-facing | End users | Guide action | InvalidEmail, NotFound | | Internal | Developers | Debug info | DatabaseError, ParseError | | System | Ops/SRE | Monitor/alert | ConnectionTimeout, RateLimited | | Transient | Automation | Retry | NetworkError, ServiceUnavailable | | Permanent | Human | Investigate | ConfigInvalid, DataCorrupted |

Thinking Prompt

Before designing error types:

Who sees this error?
- End user → friendly message, actionable
- Developer → detailed, debuggable
- Ops → structured, alertable
Can we recover?
- Transient → retry with backoff
- Degradable → fallback value
- Permanent → fail fast, alert
What context is needed?
- Call chain → anyhow::Context
- Request ID → structured logging
- Input data → error payload

Trace Up ↑

To domain constraints (Layer 3):

"How should I handle payment failures?"
    ↑ Ask: What are the business rules for retries?
    ↑ Check: domain-fintech (transaction requirements)
    ↑ Check: SLA (availability requirements)

| Question | Trace To | Ask | |----------|----------|-----| | Retry policy | domain-* | What's acceptable latency for retry? | | User experience | domain-* | What message should users see? | | Compliance | domain-* | What must be logged for audit? |

Trace Down ↓

To implementation (Layer 1):

"Need typed errors"
    ↓ m06-error-handling: thiserror for library
    ↓ m04-zero-cost: Error enum design

"Need error context"
    ↓ m06-error-handling: anyhow::Context
    ↓ Logging: tracing with fields

"Need retry logic"
    ↓ m07-concurrency: async retry patterns
    ↓ Crates: tokio-retry, backoff

Quick Reference

| Recovery Pattern | When | Implementation | |------------------|------|----------------| | Retry | Transient failures | exponential backoff | | Fallback | Degraded mode | cached/default value | | Circuit Breaker | Cascading failures | failsafe-rs | | Timeout | Slow operations | tokio::time::timeout | | Bulkhead | Isolation | separate thread pools |

Error Hierarchy

#[derive(thiserror::Error, Debug)]
pub enum AppError {
    // User-facing
    #[error("Invalid input: {0}")]
    Validation(String),

    // Transient (retryable)
    #[error("Service temporarily unavailable")]
    ServiceUnavailable(#[source] reqwest::Error),

    // Internal (log details, show generic)
    #[error("Internal error")]
    Internal(#[source] anyhow::Error),
}

impl AppError {
    pub fn is_retryable(&self) -> bool {
        matches!(self, Self::ServiceUnavailable(_))
    }
}

Retry Pattern

use tokio_retry::{Retry, strategy::ExponentialBackoff};

async fn with_retry<F, T, E>(f: F) -> Result<T, E>
where
    F: Fn() -> impl Future<Output = Result<T, E>>,
    E: std::fmt::Debug,
{
    let strategy = ExponentialBackoff::from_millis(100)
        .max_delay(Duration::from_secs(10))
        .take(5);

    Retry::spawn(strategy, || f()).await
}

Common Mistakes

| Mistake | Why Wrong | Better | |---------|-----------|--------| | Same error for all | No actionability | Categorize by audience | | Retry everything | Wasted resources | Only transient errors | | Infinite retry | DoS self | Max attempts + backoff | | Expose internal errors | Security risk | User-friendly messages | | No context | Hard to debug | .context() everywhere |

Anti-Patterns

| Anti-Pattern | Why Bad | Better | |--------------|---------|--------| | String errors | No structure | thiserror types | | panic! for recoverable | Bad UX | Result with context | | Ignore errors | Silent failures | Log or propagate | | Box<dyn Error> everywhere | Lost type info | thiserror | | Error in happy path | Performance | Early validation |

Related Skills

| When | See | |------|-----| | Error handling basics | m06-error-handling | | Retry implementation | m07-concurrency | | Domain modeling | m09-domain | | User-facing APIs | domain-* |

Agent Skills: Domain Error Strategy

Install this agent skill to your local

Skill Files