Agent Skills: Observability Control

Manage observability stack lifecycle (start, stop, backup, restore, upgrade). Use when controlling the LGTM stack for Claude Code monitoring.

UncategorizedID: adaptationio/skrillz/observability-control

Install this agent skill to your local

pnpm dlx add-skill https://github.com/adaptationio/Skrillz/tree/HEAD/.claude/skills/observability-control

Skill Files

Browse the full folder contents for observability-control.

Download Skill

Loading file tree…

.claude/skills/observability-control/SKILL.md

Skill Metadata

Name
observability-control
Description
Manage observability stack lifecycle (start, stop, backup, restore, upgrade). Use when controlling the LGTM stack for Claude Code monitoring.

Observability Control

Manage the lifecycle of the observability stack for Claude Code telemetry.

Stack Locations

| Environment | Docker Compose Path | |-------------|---------------------| | Primary Stack | /mnt/c/data/github/botaniqal-medtech/botaniqal-medtech/observability/docker-compose.yml | | Skill-based Stack | /mnt/c/data/github/.observability/docker-compose.yml |

Components

| Service | Port | Purpose | |---------|------|---------| | Grafana | 3000 | Dashboards and visualization | | Prometheus | 9090 | Metrics storage | | Loki | 3100 | Log aggregation | | Tempo | 3200 | Distributed tracing | | OTEL Collector | 4317/4318 | Telemetry receiver | | Promtail | - | Log shipping |

Operations

start

Start observability stack.

docker compose -f /mnt/c/data/github/botaniqal-medtech/botaniqal-medtech/observability/docker-compose.yml up -d

stop

Stop stack gracefully (preserves data).

docker compose -f /mnt/c/data/github/botaniqal-medtech/botaniqal-medtech/observability/docker-compose.yml down

restart [service]

Restart specific service or all services.

# Restart all
docker compose -f /path/docker-compose.yml restart

# Restart specific
docker restart loki

status

Health check all components.

docker ps --format "table {{.Names}}\t{{.Status}}" | grep -E "(otel|loki|grafana|prometheus|tempo)"

Output: Running services, health status.

health

Verify service endpoints.

curl -s http://localhost:3000/api/health  # Grafana
curl -s http://localhost:9090/-/healthy   # Prometheus
curl -s http://localhost:3100/ready       # Loki
curl -s http://localhost:3200/ready       # Tempo

backup

Export dashboards and configurations.

# Backup dashboards
curl -s http://localhost:3000/api/search -u admin:admin | \
  jq -r '.[].uid' | \
  xargs -I {} curl -s http://localhost:3000/api/dashboards/uid/{} -u admin:admin > backup/dashboards.json

Output: .observability/backups/YYYYMMDD_HHMMSS/

restore <backup-path>

Restore from backup.

curl -X POST http://localhost:3000/api/dashboards/db \
  -H "Content-Type: application/json" \
  -u admin:admin \
  -d @backup/dashboards.json

logs [service]

View logs from stack components.

docker logs loki --tail 100
docker logs otel-collector --tail 100
docker logs grafana --tail 100

fix-permissions

Fix volume permission issues (common with Tempo).

docker volume rm observability_tempo-data
docker volume create observability_tempo-data
docker run --rm -v observability_tempo-data:/tempo alpine chown -R 10001:10001 /tempo
docker restart tempo

Quick Commands

# Check all services status
docker ps | grep -E "(otel|loki|grafana|prometheus|tempo|promtail)"

# View recent logs for issues
docker logs otel-collector --tail 50 2>&1 | grep -i error

# Test OTLP endpoint
curl -v http://localhost:4317

# Query Loki for recent data
curl -s "http://localhost:3100/loki/api/v1/labels"

# List Grafana dashboards
curl -s http://localhost:3000/api/search -u admin:admin | python3 -c "import sys,json; [print(d['title']) for d in json.load(sys.stdin)]"

Troubleshooting

OTEL Collector Unhealthy

docker logs otel-collector --tail 30
# Common fix: Ensure Prometheus has --web.enable-remote-write-receiver

Loki Unhealthy

docker logs loki --tail 30
# Common fix: Disable frontend_worker for single-node mode

Tempo Permission Denied

# Fix volume permissions
docker volume rm observability_tempo-data
docker volume create observability_tempo-data
docker run --rm -v observability_tempo-data:/tempo alpine chown -R 10001:10001 /tempo
docker restart tempo

No Data in Grafana

  1. Check telemetry env vars: env | grep OTEL
  2. Check hooks configured: cat .claude/settings.json
  3. Verify Loki receiving: curl "http://localhost:3100/loki/api/v1/labels"

Access Points

| Service | URL | Credentials | |---------|-----|-------------| | Grafana | http://localhost:3000 | admin/admin | | Prometheus | http://localhost:9090 | - | | Loki | http://localhost:3100 | - | | OTLP gRPC | localhost:4317 | - | | OTLP HTTP | localhost:4318 | - |

Scripts

  • scripts/start-stack.sh - Start observability stack
  • scripts/stop-stack.sh - Stop stack gracefully
  • scripts/health-check.sh - Check all service health
  • scripts/backup-dashboards.sh - Export Grafana dashboards
  • scripts/restore-dashboards.sh - Import dashboards from backup