GASP Diagnostics
Enables comprehensive Linux system diagnostics using GASP's AI-optimized monitoring output. Actively fetches metrics from hosts and provides intelligent analysis with context-aware interpretation.
Fetching GASP Metrics
When user mentions a host or requests a system check:
-
Fetch the metrics endpoint
web_fetch("http://{hostname}:8080/metrics") -
Hostname formats supported
- mDNS/local:
accelerated.local,hyperion.local - DNS names:
proxmox1,dev-server,workstation - IP addresses:
192.168.1.100
- mDNS/local:
-
Default port: 8080 (unless user specifies otherwise)
-
Error handling
- Host unreachable: Inform user, suggest checking if GASP is running
- Port closed/refused: Try suggesting
systemctl status gaspon the host - JSON parse error: GASP may not be installed or wrong endpoint
- Timeout: Network issue or host down
-
Multi-host queries: If user mentions multiple hosts, fetch each in sequence and compare
Quick Diagnosis Workflow
For any system check request:
- Fetch metrics from specified host(s)
- Check summary first: Look at
summary.healthandsummary.concerns[] - Identify issues using metric correlations below
- Report findings with severity and specific recommendations
Trigger Examples
These user messages should trigger this skill and active fetching:
- "Check hyperion for me"
- "What's going on with accelerated.local?"
- "Is proxmox1 having issues?"
- "Compare hyperion and proxmox1"
- "Why is my system slow?" (fetch localhost)
- "Diagnose 192.168.1.50"
- "Check all my proxmox nodes"
Metric Interpretation
Health Summary
summary.health: Quick assessment- "healthy": No action needed
- "degraded": Issues present but not critical
- "critical": Immediate attention required
summary.concerns[]: Pre-analyzed issues to investigate firstsummary.recent_changes[]: Context for current state
CPU Analysis
Load ratio = load_avg_1m / cores:
- < 0.7: Normal usage
- 0.7-1.0: Busy but healthy
- 1.0-2.0: Saturated (may cause slowness)
- > 2.0: Severe overload
Key indicators:
trend: "increasing" is concerning even if current load is acceptablebaseline_load: Delta from baseline is more important than absolute valuetop_processes[]: Check for unexpected CPU hogs
Memory Analysis
Red flags (priority order):
oom_kills_recent > 0: CRITICAL - system killed processes, find memory hog immediatelyswap_used_mb > 0: Performance degradation in progresspressure_pct > 5%: System struggling with memory contentionusage_percent > 90%: Getting close to limits
Important: Linux uses memory for cache, so high usage_percent alone is normal. Focus on pressure and swap.
Disk I/O
Saturation indicators:
io_wait_ms > 10: Significant disk bottleneckqueue_depthconsistently high: Disk can't keep up- High
read_iopsorwrite_iopswith slow response: Disk performance issue
Storage capacity:
usage_percent > 90%: Running out of spaceusage_percent > 95%: Critical - will cause failures soon
Network
rx_bytes_per_sec/tx_bytes_per_sec: Check for unexpected traffic spikeserrors > 0ordrops > 0: Network hardware/configuration issue- Large number of
time_waitconnections: May indicate connection leak
Process Intelligence
zombie > 0: Process management bug (usually benign but indicates issue)- Processes in
D state: Stuck in uninterruptible sleep (disk or kernel issue) new_since_last[]: Check for unexpected process spawning
Systemd Services
units_failed > 0: Checkfailed_units[]arrayrecent_restarts[]: May indicate instability
Log Summary
errors_last_interval: Elevated error rate indicates problemsmessage_rate_per_min: Spikes suggest logging storm or serious issue- Review
recent_errors[]for specific problems
Desktop Metrics (when present)
gpu.utilization_pctvs CPU: Identify GPU-bound vs CPU-bound workloadsgpu.temperature_c > 85: Thermal throttling likelyactive_window: Provides context for resource usage
Common System Patterns
Development Workstation (Expected)
- High memory usage from IDEs, browsers
- Firefox/Chrome often in top memory consumers
- Docker daemon using CPU/memory
- VSCode, JetBrains IDEs in top processes
- Baseline load: 10-30% of cores
Container Host (Expected)
- Elevated baseline load (many processes)
- dockerd/containerd in top processes
- 50-70% memory usage normal
- Many processes in top list
Proxmox/Virtualization Host (Expected)
- Baseline load proportional to VM count
- Consistent low-level resource usage
- ~2GB overhead for Proxmox itself
- Multiple QEMU/KVM processes
GPU Workload (Expected)
- High GPU utilization with lower CPU
- Significant GPU memory usage
- Common for: rendering, ML inference, gaming
Multi-Host Analysis
When checking multiple hosts:
- Fetch all hosts first (parallel thinking)
- Compare baselines: Identify outliers
- Look for correlations: Network event vs individual host issue
- Check recent_changes: Migrations, deployments, package updates
- Identify the odd one out: Which host differs from the pattern?
Example analysis pattern:
Host 1: Load 2.1/8 cores (26%), normal
Host 2: Load 7.8/8 cores (97%), ATTENTION NEEDED ← outlier
Host 3: Load 1.9/8 cores (24%), normal
Focus on Host 2 - investigate top_processes
Diagnosis Strategies
"System is slow"
- Check load ratio (CPU saturation?)
- Check io_wait (disk bottleneck?)
- Check memory pressure (swapping?)
- Identify top consumer in relevant category
- Assess if consumption is expected for that process
"High memory usage"
- First: Check pressure_pct (real issue or just cache?)
- Check swap_used_mb (actual problem?)
- Find top memory consumers
- Check process uptime (leak or normal?)
- Compare to baseline (delta more important than absolute)
"Unexpected behavior"
- Check recent_changes for clues
- Review systemd failed units
- Check recent_errors in logs
- Look for new processes since last snapshot
- Compare current metrics to baseline
Reporting Guidelines
When reporting findings:
- Start with verdict: "Healthy", "Issue found", "Critical problem"
- Be specific: Name the process/service causing issues
- Provide context: Is this expected for this host type?
- Give actionable recommendations: What should user do?
- Include relevant metrics: Back up findings with data
Good example:
"Issue found on accelerated.local: Memory pressure at 8.2%. The postgres container started swapping 2 hours ago and is now using 12GB RAM (up from 4GB baseline). This likely indicates a query leak. Recommend checking recent queries and restarting the container."
Bad example:
"Memory usage is high. You might want to look into it."
Advanced Diagnostics
For complex issues or when initial analysis is unclear, consult:
- references/diagnostic-workflows.md - Detailed diagnostic procedures
- references/common-patterns.md - Infrastructure-specific patterns
Using with Provided JSON
If user pastes GASP JSON instead of requesting a fetch:
- Analyze the provided JSON using all guidance above
- Don't attempt to fetch (data already provided)
- Apply same interpretation and reporting guidelines