Agent Skills: Server Monitoring

Game server monitoring with metrics, alerting, and performance tracking for production reliability

UncategorizedID: pluginagentmarketplace/custom-plugin-server-side-game-dev/monitoring

Skill Files

Browse the full folder contents for monitoring.

Download Skill

Loading file tree…

skills/monitoring/SKILL.md

Skill Metadata

Name
monitoring
Description
Game server monitoring with metrics, alerting, and performance tracking for production reliability

Server Monitoring

Monitor game server health with metrics, logs, and alerts.

Key Game Metrics

const prometheus = require('prom-client');

// Player metrics
const activePlayers = new prometheus.Gauge({
  name: 'game_active_players',
  help: 'Currently connected players',
  labelNames: ['region', 'game_mode']
});

const matchesInProgress = new prometheus.Gauge({
  name: 'game_matches_active',
  help: 'Active matches',
  labelNames: ['game_mode']
});

// Performance metrics
const tickDuration = new prometheus.Histogram({
  name: 'game_tick_duration_seconds',
  help: 'Game loop tick duration',
  buckets: [0.001, 0.005, 0.01, 0.016, 0.033]
});

const networkLatency = new prometheus.Histogram({
  name: 'game_network_latency_ms',
  help: 'Player network latency',
  labelNames: ['region'],
  buckets: [10, 25, 50, 75, 100, 150, 200]
});

Alert Rules

groups:
- name: game-alerts
  rules:
  - alert: GameServerDown
    expr: up{job="game-servers"} == 0
    for: 1m
    labels:
      severity: critical

  - alert: HighTickLatency
    expr: histogram_quantile(0.99, game_tick_duration_seconds) > 0.02
    for: 5m
    labels:
      severity: high

  - alert: LowPlayerCount
    expr: game_active_players < 10
    for: 10m
    labels:
      severity: warning

Target Thresholds

| Metric | Target | Alert | |--------|--------|-------| | Tick Rate | 60 Hz | < 55 Hz | | Latency P99 | < 100ms | > 200ms | | Memory | < 80% | > 90% | | CPU | < 70% | > 85% |

Troubleshooting

Common Failure Modes

| Error | Root Cause | Solution | |-------|------------|----------| | Missing metrics | Scrape failure | Check targets | | Alert storms | Too sensitive | Tune thresholds | | Dashboard slow | Too many queries | Aggregate | | Gaps in data | Network issues | Add redundancy |

Debug Checklist

# Check Prometheus targets
curl localhost:9090/api/v1/targets | jq '.data.activeTargets'

# Check firing alerts
curl localhost:9090/api/v1/alerts | jq '.data.alerts'

# Query metrics
curl 'localhost:9090/api/v1/query?query=game_active_players'

Unit Test Template

describe('Metrics', () => {
  test('records tick duration', async () => {
    const end = tickDuration.startTimer();
    await sleep(10);
    end();

    const metrics = await prometheus.register.metrics();
    expect(metrics).toContain('game_tick_duration_seconds');
  });
});

Resources

  • assets/ - Dashboard configs
  • references/ - Alerting guides