Agent Skills: Observability

Implement logging, metrics, tracing, and alerting for production applications. Covers structured logging (Pino, Winston), metrics (Prometheus, DataDog, CloudWatch), distributed tracing (OpenTelemetry), and alert design. Use this skill when adding logging to services, setting up monitoring, creating alerts, debugging production issues, or designing SLIs/SLOs. Triggers on "logging", "monitoring", "alerting", "observability", "metrics", "tracing", "debug production", "correlation id", "structured logging", "dashboards", "SLI", "SLO".

UncategorizedID: srstomp/pokayokay/observability

Install this agent skill to your local

pnpm dlx add-skill https://github.com/srstomp/pokayokay/tree/HEAD/plugins/pokayokay/skills/observability

Skill Files

Browse the full folder contents for observability.

Download Skill

Loading file tree…

plugins/pokayokay/skills/observability/SKILL.md

Skill Metadata

Name
observability
Description
Use when adding logging to services, setting up monitoring, creating alerts, debugging production issues, designing SLIs/SLOs, or implementing structured logging (Pino, Winston), metrics (Prometheus, DataDog, CloudWatch), or distributed tracing (OpenTelemetry).

Observability

Implement the three pillars of observability: logs, metrics, and traces.

The Three Pillars

| Pillar | Purpose | Key Question | |--------|---------|--------------| | Logs | Discrete events with context | What happened? | | Metrics | Aggregated measurements | How much/many? | | Traces | Request flow across services | Where did time go? |

Quick Pick

  • Debug specific request? → Logs + Traces
  • Alert on thresholds? → Metrics
  • Understand system health? → All three
  • Starting from zero? → Logs first, then metrics, then traces

Key Principles

  • Use structured logging (JSON) with correlation IDs across all services
  • Instrument the four golden signals: latency, traffic, errors, saturation
  • Define SLIs/SLOs before building dashboards or alerts
  • Alert on symptoms (user impact), not causes (CPU usage)

Quick Start Checklist

  1. Set up structured logger (Pino recommended for Node.js)
  2. Add request correlation IDs (middleware)
  3. Instrument key metrics (RED: Rate, Errors, Duration)
  4. Configure distributed tracing (OpenTelemetry)
  5. Create dashboards for golden signals
  6. Set up alerts with appropriate severity levels

References

| Reference | Description | |-----------|-------------| | logging-patterns.md | Structured logging, log levels, Pino/Winston setup | | metrics-guide.md | Prometheus, counters/gauges/histograms, golden signals | | tracing-basics.md | OpenTelemetry, distributed tracing, span design | | alerting-guide.md | Alert design, SLIs/SLOs, severity levels, dashboards |