holmesgpt-skill
Guide for implementing HolmesGPT - an AI agent for troubleshooting cloud-native environments. Use when investigating Kubernetes issues, analyzing alerts from Prometheus/AlertManager/PagerDuty, performing root cause analysis, configuring HolmesGPT installations (CLI/Helm/Docker), setting up AI providers (OpenAI/Anthropic/Azure), creating custom toolsets, or integrating with observability platforms (Grafana, Loki, Tempo, DataDog).
mimir
Guide for implementing Grafana Mimir - a horizontally scalable, highly available, multi-tenant TSDB for long-term storage of Prometheus metrics. Use when configuring Mimir on Kubernetes, setting up Azure/S3/GCS storage backends, troubleshooting authentication issues, or optimizing performance.
prometheus-api
Query and interact with Prometheus HTTP API for monitoring data. Use when Claude needs to query Prometheus metrics, execute PromQL queries, retrieve targets/alerts/rules status, access metadata about series/labels, manage TSDB operations, or troubleshoot monitoring infrastructure. Supports instant queries, range queries, metadata endpoints, admin APIs, and alerting information.
aggregating-gauge-metrics
Aggregate pre-computed metrics (gauge, counter, delta types) using OPAL. Use when analyzing request counts, error rates, resource utilization, or any numeric metrics over time. Covers align + m() + aggregate pattern, summary vs time-series output, and common aggregation functions. For percentile metrics (tdigest), see analyzing-tdigest-metrics skill.
dapr-observability-setup
Configure OpenTelemetry tracing, metrics, and structured logging for DAPR applications. Integrates with Azure Monitor, Jaeger, Prometheus, and other observability backends.
monitoring-observability
Prometheus, Grafana, logging, alerting, and data pipeline observability
observability
Logging, metrics, and distributed tracing. OpenTelemetry, Prometheus, Grafana, and production debugging.
monitoring
Master Kubernetes observability, monitoring with Prometheus, logging, metrics, and distributed tracing. Learn to implement comprehensive monitoring strategies.
prometheus-configuration
Set up Prometheus for comprehensive metric collection, storage, and monitoring of infrastructure and applications. Use when implementing metrics collection, setting up monitoring infrastructure, or configuring alerting systems.
monitoring-skill
Monitoring and observability with Prometheus, Grafana, ELK Stack, and distributed tracing.
prometheus
Prometheus monitoring and alerting for cloud-native observability. Use when implementing metrics collection, PromQL queries, alerting rules, or service discovery. Triggers: prometheus, promql, metrics, alertmanager, service discovery, recording rules, alerting, scrape config.
Prometheus Analysis
This skill should be used when the user asks to "query Prometheus", "analyze Prometheus metrics", "check Prometheus alerts", "write PromQL", "interpret Prometheus data", "fetch metrics", or mentions Prometheus querying, alerting, or metrics analysis. Provides guidance for querying and interpreting Prometheus metrics for root cause analysis.
prometheus-go-code-review
Reviews Prometheus instrumentation in Go code for proper metric types, labels, and patterns. Use when reviewing code with prometheus/client_golang metrics.
infrastructure-monitoring
Set up comprehensive infrastructure monitoring with Prometheus, Grafana, and alerting systems for metrics, health checks, and performance tracking.
Model Monitoring
Monitor model performance, detect data drift, concept drift, and anomalies in production using Prometheus, Grafana, and MLflow
prometheus-monitoring
Set up Prometheus monitoring for applications with custom metrics, scraping configurations, and service discovery. Use when implementing time-series metrics collection, monitoring applications, or building observability infrastructure.