Agent Skills: Testing Ransomware Recovery Procedures

>-

UncategorizedID: plurigrid/asi/testing-ransomware-recovery-procedures

Install this agent skill to your local

pnpm dlx add-skill https://github.com/plurigrid/asi/tree/HEAD/plugins/asi/skills/testing-ransomware-recovery-procedures

Skill Files

Browse the full folder contents for testing-ransomware-recovery-procedures.

Download Skill

Loading file tree…

plugins/asi/skills/testing-ransomware-recovery-procedures/SKILL.md

Skill Metadata

Name
testing-ransomware-recovery-procedures
Description
>-

Testing Ransomware Recovery Procedures

When to Use

Use this skill when:

  • Validating that ransomware recovery plans actually work under realistic conditions
  • Measuring RTO (Recovery Time Objective) and RPO (Recovery Point Objective) against business requirements
  • Testing backup restore operations to confirm data integrity and completeness after simulated encryption
  • Conducting tabletop exercises or live recovery drills for ransomware scenarios
  • Auditing disaster recovery readiness as part of compliance or cyber insurance requirements

Do not use for active incident response during a live ransomware attack. Use dedicated IR playbooks instead.

Prerequisites

  • Isolated recovery test environment (air-gapped or network-segmented lab)
  • Access to backup infrastructure (Veeam, Commvault, Rubrik, AWS Backup, Azure Backup)
  • Documented RTO/RPO targets per application tier from business impact analysis
  • Backup copies available for restore testing (production replicas or test snapshots)
  • Recovery runbooks with step-by-step procedures for each critical system

Workflow

Step 1: Define Recovery Test Scope

Identify critical systems and their tiered recovery targets:

| Tier | System Type | RTO Target | RPO Target | Example | |------|------------|------------|------------|---------| | Tier 1 | Mission-critical | < 1 hour | < 15 min | Active Directory, core database | | Tier 2 | Business-critical | < 4 hours | < 1 hour | ERP, email, CRM | | Tier 3 | Business-operational | < 24 hours | < 4 hours | File shares, internal apps | | Tier 4 | Non-critical | < 72 hours | < 24 hours | Dev/test, analytics |

Step 2: Prepare Test Environment

# Verify isolated recovery network is segmented
# No routes to production should exist
ip route show | grep -v "192.168.100.0/24"  # recovery VLAN only

# Verify backup catalog is accessible
restic snapshots --repo s3:s3.amazonaws.com/backup-bucket --password-file /etc/restic/pw
# Or for Veeam:
# Get-VBRBackup | Where-Object {$_.JobType -eq "Backup"} | Select Name, LastPointCreationTime

Step 3: Execute Restore and Measure RTO

For each tiered system, measure the full recovery timeline:

  1. Detection to Decision - Time from simulated alert to restore decision
  2. Backup Locate - Time to identify and select the correct clean restore point
  3. Restore Execution - Time to restore data/VM/application from backup
  4. Validation - Time to verify data integrity and application functionality
  5. Service Restoration - Time until the system is fully operational
Recovery Timeline Measurement:
  T0: Incident declared (simulated ransomware detection)
  T1: Recovery team assembled and backup identified
  T2: Restore initiated from clean backup
  T3: Restore completed, integrity checks passed
  T4: Application validated and service restored

  Actual RTO = T4 - T0
  Actual RPO = T0 - backup_timestamp

Step 4: Validate Data Integrity Post-Restore

# Compare file counts between backup manifest and restored data
find /restored/data -type f | wc -l
# Compare against pre-backup manifest

# Verify database consistency after restore
pg_isready -h localhost -p 5432
psql -c "SELECT count(*) FROM critical_table;" -d restored_db

# Hash verification of critical files
sha256sum /restored/data/critical_config.xml
# Compare against known-good hash from backup manifest

Step 5: Test Credential Rotation and Security Hardening

After restore, validate that security controls are re-established:

  1. Rotate all service account passwords and API keys
  2. Verify MFA is enabled on all administrative accounts
  3. Confirm EDR/AV agents are running and reporting to management console
  4. Validate firewall rules block known C2 indicators
  5. Check that restored systems have latest security patches

Step 6: Document Results and Calculate Gap

Recovery Test Report:
  System: [Name]
  Tier: [1-4]
  RTO Target: [target]    Actual RTO: [measured]    Gap: [delta]
  RPO Target: [target]    Actual RPO: [measured]    Gap: [delta]
  Data Integrity: [PASS/FAIL]
  Application Validation: [PASS/FAIL]
  Security Controls Restored: [PASS/FAIL]

  Status: [MEETS TARGET / EXCEEDS TARGET / FAILS TARGET]
  Remediation Required: [description if FAILS]

Key Concepts

| Term | Definition | |------|-----------| | RTO | Recovery Time Objective: maximum acceptable downtime for a system after a disaster | | RPO | Recovery Point Objective: maximum acceptable data loss measured in time | | WRT | Work Recovery Time: time to verify system integrity after restore completes | | MTD | Maximum Tolerable Downtime: absolute limit before unacceptable business impact | | Clean Restore Point | A backup verified to be free of ransomware artifacts or encryption | | Recovery Sequencing | The order in which interdependent systems must be restored | | Air-Gapped Backup | Backup stored on media physically disconnected from the network |

Tools & Systems

| Tool | Purpose | |------|---------| | Veeam Backup & Replication | VM and physical server backup and restore | | Commvault | Enterprise data protection and recovery orchestration | | Rubrik | Cloud-native backup with ransomware recovery SLA | | AWS Backup | Centralized backup for AWS services | | Azure Backup | Microsoft cloud backup with immutable vault | | Restic | Open-source encrypted backup tool | | Velero | Kubernetes cluster backup and restore |

Common Pitfalls

  • Not testing restores regularly: Backups that are never tested often fail when needed. Test quarterly at minimum.
  • Ignoring recovery sequencing: Restoring an application before its database dependency causes cascading failures.
  • Skipping credential rotation: Restored systems may contain compromised credentials that allow re-infection.
  • Using production network for testing: Recovery tests on production networks risk spreading simulated or real infections.
  • Measuring RTO without WRT: Restore completion is not recovery completion. Include validation and hardening time.
  • No immutable backups: If ransomware can encrypt or delete backups, recovery is impossible. Use air-gapped or immutable storage.

References

  • NIST SP 800-184: Guide for Cybersecurity Event Recovery
  • CISA Ransomware Guide: https://www.cisa.gov/stopransomware
  • Veeam RTO/RPO Best Practices: https://www.veeam.com/blog/recovery-time-recovery-point-objectives.html
  • NIST CSF 2.0 RC.RP (Recovery Planning)