Agent Skills: Observability Skill

Logging, metrics, and distributed tracing. OpenTelemetry, Prometheus, Grafana, and production debugging.

loggingmetricsdistributed-tracingopentelemetryprometheusgrafanaproduction-debugging
backendID: pluginagentmarketplace/custom-plugin-backend/observability

Skill Files

Browse the full folder contents for observability.

Download Skill

Loading file tree…

skills/observability/SKILL.md

Skill Metadata

Name
observability
Description
Logging, metrics, and distributed tracing. OpenTelemetry, Prometheus, Grafana, and production debugging.

Observability Skill

Bonded to: backend-observability-agent (Primary)


Quick Start

# Invoke observability skill
"Set up structured logging for my application"
"Configure Prometheus metrics"
"Implement distributed tracing with OpenTelemetry"

Three Pillars

| Pillar | Purpose | Tools | |--------|---------|-------| | Logs | What happened | ELK, Loki | | Metrics | System state | Prometheus, Grafana | | Traces | Request flow | Jaeger, OpenTelemetry |


Examples

Structured Logging

import structlog

structlog.configure(
    processors=[
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.JSONRenderer()
    ]
)

logger = structlog.get_logger()

def process_request(request):
    log = logger.bind(
        correlation_id=request.headers.get("X-Correlation-ID"),
        user_id=request.user.id
    )
    log.info("request_started", method=request.method)

Prometheus Metrics

from prometheus_client import Counter, Histogram

REQUEST_COUNT = Counter('http_requests_total', 'Total requests', ['method', 'status'])
REQUEST_LATENCY = Histogram('http_request_duration_seconds', 'Request latency')

@REQUEST_LATENCY.time()
async def handle_request(request):
    response = await process(request)
    REQUEST_COUNT.labels(method=request.method, status=response.status_code).inc()
    return response

OpenTelemetry Tracing

from opentelemetry import trace

tracer = trace.get_tracer(__name__)

@tracer.start_as_current_span("process_order")
def process_order(order_id: str):
    span = trace.get_current_span()
    span.set_attribute("order.id", order_id)

    with tracer.start_as_current_span("validate"):
        validate(order_id)

    with tracer.start_as_current_span("charge"):
        charge(order_id)

Key Metrics (RED Method)

| Metric | Description | Alert Threshold | |--------|-------------|-----------------| | Rate | Requests/sec | Drop > 50% | | Errors | Error rate % | > 1% for 5 min | | Duration | P99 latency | > 500ms for 5 min |


Troubleshooting

| Issue | Cause | Solution | |-------|-------|----------| | Missing logs | Log level too high | Adjust log level | | High cardinality | Too many labels | Reduce label values | | Broken traces | Context not propagated | Forward headers |


Resources