Agent Skills: Intercom Incident Runbook

|

UncategorizedID: jeremylongshore/claude-code-plugins-plus-skills/intercom-incident-runbook

Install this agent skill to your local

pnpm dlx add-skill https://github.com/jeremylongshore/claude-code-plugins-plus-skills/tree/HEAD/plugins/saas-packs/intercom-pack/skills/intercom-incident-runbook

Skill Files

Browse the full folder contents for intercom-incident-runbook.

Download Skill

Loading file tree…

plugins/saas-packs/intercom-pack/skills/intercom-incident-runbook/SKILL.md

Skill Metadata

Name
intercom-incident-runbook
Description
'Execute Intercom incident response procedures with triage, mitigation,

Intercom Incident Runbook

Overview

Rapid incident response procedures for Intercom integration failures, including triage by HTTP status code, mitigation steps, and postmortem template.

Severity Levels

| Level | Definition | Response Time | Example | |-------|------------|---------------|---------| | P1 | All Intercom API calls failing | < 15 min | 401 auth failures, API unreachable | | P2 | Degraded service | < 1 hour | High latency, rate limited (429) | | P3 | Partial impact | < 4 hours | Webhook delays, search timeouts | | P4 | No user impact | Next business day | Monitoring gaps, stale cache |

Quick Triage (Copy-Paste)

#!/bin/bash
echo "=== Intercom Incident Triage ==="

# 1. Is Intercom's API responding?
echo -n "1. API reachable: "
curl -s -o /dev/null -w "%{http_code}" \
  -H "Authorization: Bearer $INTERCOM_ACCESS_TOKEN" \
  https://api.intercom.io/me
echo ""

# 2. Is there a platform-wide incident?
echo -n "2. Intercom status: "
curl -s https://status.intercom.com/api/v2/status.json | jq -r '.status.description'

# 3. Active incidents on Intercom's side?
echo -n "3. Active incidents: "
curl -s https://status.intercom.com/api/v2/incidents/unresolved.json | jq '.incidents | length'

# 4. Rate limit status
echo -n "4. Rate limit remaining: "
curl -s -D - -o /dev/null \
  -H "Authorization: Bearer $INTERCOM_ACCESS_TOKEN" \
  https://api.intercom.io/me 2>/dev/null | grep -i x-ratelimit-remaining | awk '{print $2}'

# 5. Our health check
echo -n "5. Our integration health: "
curl -s https://your-app.com/health | jq '.services.intercom.status' 2>/dev/null || echo "UNKNOWN"

Decision Tree

API returning errors?
├── YES ──▶ Check status.intercom.com
│           ├── Incident reported ──▶ Intercom's problem
│           │   → Enable graceful degradation
│           │   → Monitor for resolution
│           │   → No action needed on our side
│           └── No incident ──▶ Our integration issue
│               ├── 401 → Token expired/revoked → Rotate token
│               ├── 403 → Scope missing → Add OAuth scope
│               ├── 429 → Rate limited → Enable queue/backoff
│               └── 5xx → Server error → Retry with backoff
└── NO ──▶ Is our service healthy?
           ├── YES → Resolved or intermittent → Monitor
           └── NO → Our infrastructure issue
               → Check pods, memory, network, DNS

Mitigation by Error Type

401 - Authentication Failed

# Verify token is valid
curl -s -H "Authorization: Bearer $INTERCOM_ACCESS_TOKEN" \
  https://api.intercom.io/me | jq '.type'
# Expected: "admin"
# If error: Token is invalid or revoked

# IMMEDIATE: Regenerate token
# Developer Hub > Your App > Authentication > Generate new token
# Update in secret manager:
aws secretsmanager update-secret \
  --secret-id intercom/production/token \
  --secret-string "new_token_here"

# Restart application to pick up new token
kubectl rollout restart deployment/intercom-service

429 - Rate Limited

# Check rate limit headers
curl -s -D - -o /dev/null \
  -H "Authorization: Bearer $INTERCOM_ACCESS_TOKEN" \
  https://api.intercom.io/me 2>/dev/null | grep -i "x-ratelimit"

# Immediate: Reduce request volume
# - Pause any batch/sync jobs
# - Enable request queuing if available

# Check if multiple apps are consuming workspace quota
# Limit: 25,000 req/min per workspace across all apps

5xx - Intercom Server Errors

# 1. Check Intercom status
curl -s https://status.intercom.com/api/v2/status.json | jq

# 2. Enable graceful degradation
# Your app should serve cached data or fallback UI

# 3. Track request_id from error responses for Intercom support
# Error response includes: { "request_id": "req_abc123" }

Graceful Degradation Pattern

import { IntercomClient, IntercomError } from "intercom-client";
import { LRUCache } from "lru-cache";

const cache = new LRUCache<string, any>({ max: 10000, ttl: 3600000 }); // 1hr fallback

async function getContactWithFallback(contactId: string): Promise<any> {
  try {
    const contact = await client.contacts.find({ contactId });
    cache.set(contactId, contact); // Update cache on success
    return contact;
  } catch (err) {
    if (err instanceof IntercomError && (err.statusCode === 429 || (err.statusCode ?? 0) >= 500)) {
      // Return stale cached data during outages
      const cached = cache.get(contactId);
      if (cached) {
        console.warn(`[Intercom] Serving cached data for ${contactId} due to ${err.statusCode}`);
        return { ...cached, _stale: true };
      }
    }
    throw err;
  }
}

Communication Templates

Internal Slack

[P1] INCIDENT: Intercom Integration
Status: INVESTIGATING
Impact: [Customer conversations not loading / messages not sending]
Cause: [Intercom API returning 5xx / our token expired / rate limited]
Action: [Enabling fallback / rotating token / pausing sync jobs]
Next update: [Time]
Commander: @[name]

Postmortem Template

## Incident: Intercom [Type]
**Date:** YYYY-MM-DD HH:MM - HH:MM UTC
**Duration:** X hours Y minutes
**Severity:** P[1-4]
**Intercom request_ids:** [req_abc123, req_def456]

### Summary
[1-2 sentences describing what happened and user impact]

### Timeline
- HH:MM - First alert: [what triggered]
- HH:MM - Triage started: [findings]
- HH:MM - Mitigation: [action taken]
- HH:MM - Resolution: [what fixed it]

### Root Cause
[Technical explanation of why it happened]

### Impact
- Conversations affected: N
- Users unable to reach support: N
- Duration of degraded service: Xm

### Action Items
- [ ] [Preventive measure] - Owner - Due
- [ ] [Monitoring gap to fill] - Owner - Due
- [ ] [Documentation to update] - Owner - Due

Error Handling

| Issue | Cause | Solution | |-------|-------|----------| | Triage script fails | Token not set | Export INTERCOM_ACCESS_TOKEN | | Status page unreachable | DNS/network | Try mobile network or VPN | | Can't rotate token | No Developer Hub access | Escalate to workspace admin | | Cache empty during outage | No pre-warming | Implement cache warming job |

Resources

Next Steps

For data handling compliance, see intercom-data-handling.