Infrastructure Monitoring Skill

Infrastructure Monitoring

Overview
When to Use
Quick Start
Reference Guides
Best Practices

Overview

Implement comprehensive infrastructure monitoring to track system health, performance metrics, and resource utilization with alerting and visualization across your entire stack.

When to Use

Real-time performance monitoring
Capacity planning and trends
Incident detection and alerting
Service health tracking
Resource utilization analysis
Performance troubleshooting
Compliance and audit trails
Historical data analysis

Quick Start

Minimal working example:

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    monitor: "infrastructure-monitor"
    environment: "production"

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - localhost:9093

# Rule files
rule_files:
  - "alerts.yml"
  - "rules.yml"

scrape_configs:
  # Prometheus itself
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]
// ... (see reference guides for full implementation)

Reference Guides

Detailed implementations in the references/ directory:

| Guide | Contents | |---|---| | Prometheus Configuration | Prometheus Configuration | | Alert Rules | Alert Rules | | Alertmanager Configuration | Alertmanager Configuration | | Grafana Dashboard | Grafana Dashboard | | Monitoring Deployment | Monitoring Deployment |

Agent Skills: Infrastructure Monitoring

Install this agent skill to your local

Skill Files

Infrastructure Monitoring

Table of Contents

Overview

When to Use

Quick Start

Reference Guides

Best Practices

✅ DO

❌ DON'T