Observability Skill | Agent Skills

Observability

Implement the three pillars of observability: logs, metrics, and traces.

The Three Pillars

| Pillar | Purpose | Key Question | |--------|---------|--------------| | Logs | Discrete events with context | What happened? | | Metrics | Aggregated measurements | How much/many? | | Traces | Request flow across services | Where did time go? |

Quick Pick

Debug specific request? → Logs + Traces
Alert on thresholds? → Metrics
Understand system health? → All three
Starting from zero? → Logs first, then metrics, then traces

Key Principles

Use structured logging (JSON) with correlation IDs across all services
Instrument the four golden signals: latency, traffic, errors, saturation
Define SLIs/SLOs before building dashboards or alerts
Alert on symptoms (user impact), not causes (CPU usage)

Quick Start Checklist

Set up structured logger (Pino recommended for Node.js)
Add request correlation IDs (middleware)
Instrument key metrics (RED: Rate, Errors, Duration)
Configure distributed tracing (OpenTelemetry)
Create dashboards for golden signals
Set up alerts with appropriate severity levels

References

| Reference | Description | |-----------|-------------| | logging-patterns.md | Structured logging, log levels, Pino/Winston setup | | metrics-guide.md | Prometheus, counters/gauges/histograms, golden signals | | tracing-basics.md | OpenTelemetry, distributed tracing, span design | | alerting-guide.md | Alert design, SLIs/SLOs, severity levels, dashboards |

Agent Skills: Observability