Structured Logging Skill

Structured Logging

Core Principle

Emit one wide event per request per service, not scattered log statements.

Build context throughout the request lifecycle
Emit once at the end (in a finally block)
Optimize for querying, not writing
Include high-cardinality fields (user_id, request_id) that enable debugging
Include high-dimensionality (50+ fields) capturing full context

Anti-patterns

Avoid these common mistakes:

// String concatenation - loses structure
log("User " + userId + " payment failed: " + error)

// Scattered statements - 17 lines for one request
log("Starting request")
log("Validating token")
log("Token valid")
log("Fetching user")
log("User found")
log("Processing payment")
log("Payment failed")

// Low cardinality only - can't debug specific users
log({level: "error", message: "Payment failed"})

// Missing business context - no debugging power
log({user_id: "123", error: "failed"})

Wide Event Structure

One comprehensive event per request:

{
  "timestamp": "2025-01-15T10:23:45.612Z",
  "level": "error",

  "request_id": "req_8bf7ec2d",
  "trace_id": "abc123",
  "span_id": "span_456",

  "service": "checkout-service",
  "version": "2.4.1",
  "deployment_id": "deploy_789",
  "region": "us-east-1",

  "method": "POST",
  "path": "/api/checkout",
  "status_code": 500,
  "duration_ms": 1247,

  "user": {
    "id": "user_456",
    "subscription": "premium",
    "account_age_days": 847,
    "lifetime_value_cents": 284700
  },

  "cart": {
    "id": "cart_xyz",
    "item_count": 3,
    "total_cents": 15999,
    "coupon_applied": "SAVE20"
  },

  "payment": {
    "method": "card",
    "provider": "stripe",
    "latency_ms": 1089,
    "attempt": 3
  },

  "error": {
    "type": "PaymentError",
    "code": "card_declined",
    "message": "Card declined by issuer",
    "retriable": false,
    "stripe_decline_code": "insufficient_funds"
  },

  "feature_flags": {
    "new_checkout_flow": true,
    "express_payment": false
  },

  "timing": {
    "db_query_ms": 45,
    "external_api_ms": 1089,
    "db_queries_count": 3,
    "cache_hit": false
  }
}

For the comprehensive field reference, see references/wide-event-fields.md.

Log Levels

Use log levels to indicate severity and required action:

| Level | When to Use | Examples | |-------|-------------|----------| | ERROR | Request failed, needs investigation | Unhandled exception, 5xx response, data corruption | | WARN | Degraded but recovered | Retry succeeded, fallback used, rate limit approaching, deprecated API called | | INFO | Normal request completion | The canonical log line for successful requests | | DEBUG | Detailed diagnostics | Cache lookups, query plans, intermediate state (usually sampled out in production) |

The wide event's level field reflects the worst outcome of the request:

Request succeeded normally: info
Request succeeded after retry/fallback: warn
Request failed: error

Examples

Bad: Scattered Logs

log.info("Incoming request", {path: "/checkout"})
log.debug("Fetching user")
log.debug("User found", {user_id: "123"})
log.debug("Fetching cart")
log.info("Processing payment")
log.warn("Payment attempt 1 failed, retrying")
log.warn("Payment attempt 2 failed, retrying")
log.error("Payment failed after 3 attempts")

Good: Single Wide Event

log.error({
  request_id: "req_abc",
  path: "/checkout",
  method: "POST",
  status_code: 500,
  duration_ms: 3200,
  user: {id: "123", subscription: "premium"},
  cart: {item_count: 3, total_cents: 15999},
  payment: {
    provider: "stripe",
    attempts: 3,
    latency_ms: 2800
  },
  error: {
    type: "PaymentError",
    code: "card_declined",
    retriable: false
  }
})

Bad: Missing Context

log.error("Database query failed", {error: "timeout"})

Good: Full Context

log.error({
  request_id: "req_xyz",
  user: {id: "user_789", subscription: "enterprise"},
  operation: "fetch_orders",
  database: {
    host: "db-prod-01",
    query_duration_ms: 30000,
    timeout_ms: 30000,
    table: "orders",
    rows_scanned: 1500000
  },
  error: {
    type: "DatabaseError",
    code: "QUERY_TIMEOUT",
    message: "Query exceeded 30s timeout",
    retriable: true,
    suggestion: "Add index on orders.user_id or paginate results"
  }
})

Implementation Pattern

Build the event throughout the request, emit once at the end:

function loggingMiddleware(handler):
    return function(request):
        start_time = now()

        // Initialize wide event
        event = {
            request_id: request.id,
            trace_id: request.trace_id,
            timestamp: start_time,
            method: request.method,
            path: request.path,
            service: SERVICE_NAME,
            version: SERVICE_VERSION
        }

        // Make event available to handlers
        request.wide_event = event

        try:
            response = handler(request)
            event.status_code = response.status
            event.level = "info"
            return response

        catch error:
            event.status_code = 500
            event.level = "error"
            event.error = {
                type: error.name,
                message: error.message,
                code: error.code,
                retriable: error.retriable ?? false
            }
            throw error

        finally:
            event.duration_ms = now() - start_time
            logger.log(event)

In handlers, enrich with business context:

function checkoutHandler(request):
    event = request.wide_event

    // Add user context
    user = getUser(request)
    event.user = {
        id: user.id,
        subscription: user.tier,
        account_age_days: daysSince(user.created_at),
        lifetime_value_cents: user.ltv
    }

    // Add business context
    cart = getCart(user.id)
    event.cart = {
        id: cart.id,
        item_count: cart.items.length,
        total_cents: cart.total
    }

    // Track timing for external calls
    payment_start = now()
    result = processPayment(cart)
    event.payment = {
        provider: "stripe",
        latency_ms: now() - payment_start,
        attempt: result.attempt_number
    }

    return response(result)

Querying Power

Wide events enable queries that scattered logs cannot:

-- Find all checkout failures for premium users with new checkout flow
SELECT * FROM logs
WHERE path = '/checkout'
  AND status_code >= 500
  AND user.subscription = 'premium'
  AND feature_flags.new_checkout_flow = true
  AND timestamp > now() - interval '1 hour'

-- Analyze payment failures by error code and user tier
SELECT
  error.code,
  user.subscription,
  count(*) as failures,
  avg(payment.latency_ms) as avg_latency
FROM logs
WHERE error.type = 'PaymentError'
GROUP BY error.code, user.subscription

-- Find slow requests for enterprise customers
SELECT * FROM logs
WHERE duration_ms > 2000
  AND user.subscription = 'enterprise'
ORDER BY duration_ms DESC

Agent Skills: Structured Logging

Install this agent skill to your local

Skill Files