Back to tags
Tag

Agent Skills with tag: incident-management

7 skills match this tag. Use tags to discover related Agent Skills and explore similar workflows.

error-recovery

Use when encountering failures - assess severity, preserve evidence, execute rollback decision tree, and verify post-recovery state

rollback-strategieserror-handlingincident-managementrunbook
troykelly
troykelly
1

lessons-learned

インシデントから抽出された教訓・ベストプラクティスを体系的に管理し、チーム全体の知識として共有・活用するナレッジベース。継続的学習と品質向上の核となるSkill。

lessons-learnedincident-managementbest-practicescontinuous-improvement
Gaku52
Gaku52
1

crisis_persistence_eval

>

risk-assessmentincident-managementcrisis-managementevaluation
GOATnote-Inc
GOATnote-Inc
31

Root Cause Analysis Methodology

This skill should be used when the user asks to "perform root cause analysis", "investigate production issue", "analyze incident", "find root cause", "debug production error", "trace the cause", or mentions investigating production problems, alerts, or outages. Provides systematic RCA methodology and investigation workflows.

root-cause-analysisincident-managementtroubleshootingprocess-management
evangelosmeklis
evangelosmeklis
7

postmortem

Use when analyzing failures, outages, incidents, or negative outcomes, conducting blameless postmortems, documenting root causes with 5 Whys or fishbone diagrams, identifying corrective actions with owners and timelines, learning from near-misses, establishing prevention strategies, or when user mentions postmortem, incident review, failure analysis, RCA, lessons learned, or after-action review.

incident-managementroot-cause-analysisfive-whysfishbone-diagram
lyndonkl
lyndonkl
82

sre-engineer

Use when defining SLIs/SLOs, managing error budgets, or building reliable systems at scale. Invoke for incident management, chaos engineering, toil reduction, capacity planning. Keywords: SRE, site reliability, SLO, SLI, error budget, incident management, chaos engineering.

site-reliability-engineeringservice-level-objectiveserror-budgetincident-management
Jeffallan
Jeffallan
245

alert-management

Implement comprehensive alert management with PagerDuty, escalation policies, and incident coordination. Use when setting up alerting systems, managing on-call schedules, or coordinating incident response.

monitoringpagerdutyincident-managementescalation-policies
aj-geddes
aj-geddes
301