/check-production Skill

/check-production

Audit production health. Output findings as structured report.

What This Does

Query Sentry for unresolved issues
Check Vercel logs for recent errors
Test health endpoints
Check GitHub Actions for CI/CD failures
Output prioritized findings (P0-P3)

This is a primitive. It only investigates and reports. Use /log-production-issues to create GitHub issues or /triage to fix.

Process

1. Sentry Check

# Run triage script if available
~/.claude/skills/triage/scripts/check_sentry.sh 2>/dev/null || echo "Sentry check unavailable"

Or spawn Sentry MCP query if configured.

2. Vercel Logs Check

# Check for recent errors
~/.claude/skills/triage/scripts/check_vercel_logs.sh 2>/dev/null || vercel logs --output json 2>/dev/null | head -50

3. Health Endpoints

# Test health endpoint
~/.claude/skills/triage/scripts/check_health_endpoints.sh 2>/dev/null || curl -sf "$(grep NEXT_PUBLIC_APP_URL .env.local 2>/dev/null | cut -d= -f2)/api/health" | jq .

4. GitHub CI/CD Check

# Check for failed workflow runs on default branch
gh run list --branch main --status failure --limit 5 2>/dev/null || \
gh run list --branch master --status failure --limit 5 2>/dev/null

# Get details on most recent failure
gh run list --status failure --limit 1 --json databaseId,name,conclusion,createdAt,headBranch 2>/dev/null

# Check for stale/stuck workflows
gh run list --status in_progress --json databaseId,name,createdAt 2>/dev/null

What to look for:

Failed runs on main/master branch (broken CI)
Failed runs on feature branches blocking PRs
Stuck/in-progress runs that should have completed
Patterns in failure types (tests, lint, build, deploy)

5. Quick Application Checks

# Check for error handling gaps
grep -rE "catch\s*\(\s*\)" --include="*.ts" --include="*.tsx" src/ app/ 2>/dev/null | head -5
# Empty catch blocks = silent failures

Output Format

## Production Health Check

### P0: Critical (Active Production Issues)
- [SENTRY-123] PaymentIntent failed - 23 users affected (Score: 147)
  Location: api/checkout.ts:45
  First seen: 2h ago

### P1: High (Degraded Performance / Broken CI)
- Health endpoint slow: /api/health responding in 2.3s (should be <500ms)
- Vercel logs show 5xx errors in last hour (count: 12)
- [CI] Main branch failing: "Build" workflow (run #1234)
  Failed step: "Type check"
  Error: Type 'string' is not assignable to type 'number'

### P2: Medium (Warnings)
- 3 empty catch blocks found (silent failures)
- Health endpoint missing database connectivity check
- [CI] 3 feature branch workflows failing (blocking PRs)

### P3: Low (Improvements)
- Consider adding Sentry performance monitoring
- Health endpoint could include more service checks

## Summary
- P0: 1 | P1: 3 | P2: 3 | P3: 2
- Recommendation: Fix P0 immediately, then fix main branch CI

Priority Mapping

| Signal | Priority | |--------|----------| | Active errors affecting users | P0 | | 5xx errors, slow responses | P1 | | Main branch CI/CD failing | P1 | | Feature branch CI blocking PRs | P2 | | Silent failures, missing checks | P2 | | Missing monitoring, improvements | P3 |

Health Endpoint Anti-Pattern

Health checks that lie are worse than no health check. Example:

// ❌ BAD: Reports "ok" without checking
return { status: "ok", services: { database: "ok" } };

// ✅ GOOD: Honest liveness probe (no fake service status)
return { status: "ok", timestamp: new Date().toISOString() };

// ✅ BETTER: Real readiness probe
const dbStatus = await checkDatabase() ? "ok" : "error";
return { status: dbStatus === "ok" ? "ok" : "degraded", services: { database: dbStatus } };

If you can't verify a service, don't report on it. False "ok" status masks outages.

Analytics Note

This skill checks production health (errors, logs, endpoints), not product analytics.

For analytics auditing, see /check-observability. Note:

PostHog is REQUIRED for product analytics (has MCP server)
Vercel Analytics is NOT acceptable (no CLI/API/MCP - unusable for our workflow)

If you need to investigate user behavior or funnels during incident response, query PostHog via MCP.

/log-production-issues - Create GitHub issues from findings
/triage - Fix production issues
/observability - Set up monitoring infrastructure

Agent Skills: /check-production

Install this agent skill to your local

Skill Files