Pentest Validation Skill

Pentest Validation

<default_to_action> When validating security findings:

REQUIRE explicit authorization for target URL
SCAN with qe-security-scanner (SAST + dependency + secrets)
ANALYZE with qe-security-reviewer + qe-security-auditor (parallel)
VALIDATE with qe-pentest-validator (graduated exploitation, parallel per vuln type)
REPORT only confirmed findings with PoC evidence ("No Exploit, No Report")
UPDATE exploit playbook with new patterns

Quality Gates:

Authorization confirmed before ANY exploitation
Target URL is staging/dev (NOT production)
Budget cap enforced ($15 default)
Time cap enforced (30 min default)
All exploitation attempts logged </default_to_action>

Quick Reference Card

The 4-Phase Pipeline

| Phase | Agent(s) | Purpose | Parallelism | |-------|----------|---------|-------------| | 1. Recon | qe-security-scanner | SAST, DAST, dependency scan, secrets | Internal parallel | | 2. Analysis | qe-security-reviewer + qe-security-auditor | Code review + compliance check | Both in parallel | | 3. Validation | qe-pentest-validator | Graduated exploit validation | Per-vuln-type parallel | | 4. Report | qe-quality-gate | "No Exploit, No Report" filter | Sequential |

Graduated Exploitation Tiers

| Tier | Handler | Cost | Latency | Use When | |------|---------|------|---------|----------| | 1 | Agent Booster (WASM) | $0 | <1ms | Code pattern is conclusive (eval, innerHTML, hardcoded creds) | | 2 | Haiku | $0.0002 | ~500ms | Need payload test against live target | | 3 | Sonnet/Opus | $0.003-$0.015 | 2-5s | Full exploit chain with data proof |

When to Use This Skill

| Scenario | Tier | Estimated Cost | |----------|------|----------------| | PR security review (source only) | 1 | $0 | | Pre-release validation (staging) | 1-2 | $1-5 | | Full pentest validation | 1-3 | $5-15 | | Compliance audit evidence | 1-3 | $5-15 |

Configuration

pentest:
  target_url: https://staging.app.com    # REQUIRED for Tier 2-3
  source_repo: ./src                      # REQUIRED for Tier 1+
  exploitation_tier: 2                    # 1=pattern-only, 2=payload-test, 3=full-exploit
  vuln_types:                             # Which pipelines to run
    - injection                           # SQL, NoSQL, command injection
    - xss                                 # Reflected, stored, DOM XSS
    - auth                                # Auth bypass, session, JWT
    - ssrf                                # URL scheme abuse, metadata
  max_cost_usd: 15                        # Budget cap per run
  timeout_minutes: 30                     # Time cap per run
  require_authorization: true             # MUST confirm target ownership
  no_production: true                     # Block production URLs
  production_patterns:                    # URL patterns to block
    - "*.prod.*"
    - "api.*"
    - "www.*"

Safeguards (Mandatory)

Authorization Gate

Every pentest validation run MUST:

Display target URL and exploitation tier to user
Require explicit confirmation: "I own/authorized testing of this target"
Log authorization with timestamp
Block if target URL matches production patterns

What This Skill Does NOT Do

Full autonomous reconnaissance (Nmap, Subfinder)
Zero-day exploit development
Attack targets without explicit authorization
Test production systems
Store actual exfiltrated data (only proof of access)
Social engineering or phishing simulation
Port scanning or service discovery

Validation Pipelines

Injection Pipeline

| Attack | Tier 1 (Pattern) | Tier 2 (Payload) | Tier 3 (Full) | |--------|-------------------|-------------------|----------------| | SQL injection | String concat in query | ' OR '1'='1 response diff | UNION SELECT data extraction | | NoSQL injection | $where, $gt in query | Operator injection test | Collection enumeration | | Command injection | exec(), system() calls | Command delimiter test | Reverse shell proof | | LDAP injection | String concat in filter | Wildcard injection | Directory enumeration |

XSS Pipeline

| Attack | Tier 1 (Pattern) | Tier 2 (Payload) | Tier 3 (Full) | |--------|-------------------|-------------------|----------------| | Reflected XSS | No output encoding | <img onerror> reflection | Browser JS execution via Playwright | | Stored XSS | innerHTML assignment | Payload stored + retrieved | Cookie theft PoC | | DOM XSS | document.write(location) | Fragment injection | DOM manipulation proof |

Auth Pipeline

| Attack | Tier 1 (Pattern) | Tier 2 (Payload) | Tier 3 (Full) | |--------|-------------------|-------------------|----------------| | JWT none | No algorithm validation | Modified JWT accepted | Admin access with forged token | | Session fixation | No session rotation | Pre-set session reused | Cross-user session hijack | | Credential stuffing | No rate limiting | 100 attempts unblocked | Valid credential discovery | | IDOR | No authorization check | Access other user data | Full CRUD on foreign resources |

SSRF Pipeline

| Attack | Tier 1 (Pattern) | Tier 2 (Payload) | Tier 3 (Full) | |--------|-------------------|-------------------|----------------| | Internal URL | User-controlled URL fetch | http://169.254.169.254 | Cloud metadata extraction | | DNS rebinding | URL validation bypass | Rebind to internal IP | Internal service access | | Protocol smuggling | URL scheme not restricted | file:///etc/passwd | File content in response |

Agent Coordination

Orchestration Pattern

// Phase 1: Recon (parallel scans)
await Task("Security Scan", {
  target: "./src",
  layers: { sast: true, dast: true, dependencies: true, secrets: true }
}, "qe-security-scanner");

// Phase 2: Analysis (parallel review)
await Promise.all([
  Task("Code Security Review", {
    findings: phase1Results,
    depth: "comprehensive"
  }, "qe-security-reviewer"),

  Task("Compliance Audit", {
    findings: phase1Results,
    frameworks: ["owasp-top-10"]
  }, "qe-security-auditor")
]);

// Phase 3: Validation (graduated exploitation)
await Task("Exploit Validation", {
  findings: [...phase1Results, ...phase2Results],
  target_url: "https://staging.app.com",
  exploitation_tier: 2,
  vuln_types: ["injection", "xss", "auth", "ssrf"],
  max_cost_usd: 15,
  timeout_minutes: 30
}, "qe-pentest-validator");

// Phase 4: Report ("No Exploit, No Report" gate)
await Task("Security Quality Gate", {
  findings: phase3Results.confirmedFindings,
  gate: "no-exploit-no-report",
  require_poc: true
}, "qe-quality-gate");

Finding Classification

| Status | Meaning | Action | |--------|---------|--------| | confirmed-exploitable | Exploitation succeeded with PoC | Report with evidence | | likely-exploitable | Partial exploitation, defenses detected | Report with caveats | | not-exploitable | All exploitation attempts failed | Filter from report | | inconclusive | WAF/defense blocked, unclear if vulnerable | Report for manual review |

Exploit Playbook Memory

Namespace Structure

aqe/pentest/
 playbook/
  exploit/{vuln_type}/{tech_stack}/{technique}
  bypass/{defense_type}/{technique}
  payload/{vuln_type}/{variant}
 results/
  validation-{timestamp}
 poc/
  {finding_id}-poc

Learning Loop

Before validation: Query playbook for known patterns matching findings
During validation: Try known payloads first (higher success rate)
After validation: Store new successful patterns with confidence scores
Over time: Agent converges on most effective payloads per tech stack

Cost Optimization

Estimated Cost by Scenario

| Scenario | Tier Mix | Findings | Est. Cost | Est. Time | |----------|----------|----------|-----------|-----------| | PR check (source only) | 100% Tier 1 | 5 | $0 | <5s | | Sprint validation | 70% T1, 30% T2 | 15 | $2-5 | 5-10 min | | Release validation | 40% T1, 40% T2, 20% T3 | 25 | $8-15 | 15-30 min | | Full pentest | 20% T1, 30% T2, 50% T3 | 40 | $15-30 | 30-60 min |

Cost vs Shannon Comparison

| Metric | Shannon | AQE Pentest Validation | |--------|---------|----------------------| | Cost per run | ~$50 | $5-15 (graduated tiers) | | Runtime | 60-90 min | 15-30 min (parallel pipelines) | | False positive rate | Low (exploit-proven) | Low (same principle) | | Learning | None (static prompts) | ReasoningBank playbook |

Success Metrics

| Metric | Target | Measurement | |--------|--------|-------------| | False positive reduction | >60% of findings eliminated | Pre/post validator comparison | | Exploit confirmation rate | >80% of confirmed findings truly exploitable | Manual PoC verification | | Cost per run | <$15 USD | Token tracking per pipeline | | Time per run | <30 minutes | Execution time metrics | | Playbook growth | 100+ patterns after 6 months | Memory namespace count |

Related Skills

security-testing - OWASP vulnerability scanning, SAST/DAST automation
compliance-testing - Regulatory compliance
api-testing-patterns - API security testing
chaos-engineering-resilience - Security under chaos

Remember

"No Exploit, No Report." A vulnerability scanner that can't prove exploitation delivers uncertain value. This skill transforms security findings from theoretical risks into proven vulnerabilities with evidence. Every confirmed finding comes with a reproducible proof-of-concept. Every false positive is eliminated before it reaches the report.

Think proof, not prediction. Don't report what MIGHT be vulnerable. Prove what IS vulnerable.

Agent Skills: Pentest Validation

Install this agent skill to your local

Skill Files