Agent Skills: Pentest Validation

Use when validating security findings from SAST/DAST scans, proving exploitability of reported vulnerabilities, eliminating false positives, or running the 4-phase pentest pipeline (recon, analysis, validation, report).

UncategorizedID: proffesor-for-testing/agentic-qe/pentest-validation

Install this agent skill to your local

pnpm dlx add-skill https://github.com/proffesor-for-testing/agentic-qe/tree/HEAD/assets/skills/pentest-validation

Skill Files

Browse the full folder contents for pentest-validation.

Download Skill

Loading file tree…

assets/skills/pentest-validation/SKILL.md

Skill Metadata

Name
pentest-validation
Description
"Use when validating security findings from SAST/DAST scans, proving exploitability of reported vulnerabilities, eliminating false positives, or running the 4-phase pentest pipeline (recon, analysis, validation, report)."

Pentest Validation

<default_to_action> When validating security findings:

  1. REQUIRE explicit authorization for target URL
  2. SCAN with qe-security-scanner (SAST + dependency + secrets)
  3. ANALYZE with qe-security-reviewer + qe-security-auditor (parallel)
  4. VALIDATE with qe-pentest-validator (graduated exploitation, parallel per vuln type)
  5. REPORT only confirmed findings with PoC evidence ("No Exploit, No Report")
  6. UPDATE exploit playbook with new patterns

Quality Gates:

  • Authorization confirmed before ANY exploitation
  • Target URL is staging/dev (NOT production)
  • Budget cap enforced ($15 default)
  • Time cap enforced (30 min default)
  • All exploitation attempts logged </default_to_action>

Quick Reference Card

The 4-Phase Pipeline

| Phase | Agent(s) | Purpose | Parallelism | |-------|----------|---------|-------------| | 1. Recon | qe-security-scanner | SAST, DAST, dependency scan, secrets | Internal parallel | | 2. Analysis | qe-security-reviewer + qe-security-auditor | Code review + compliance check | Both in parallel | | 3. Validation | qe-pentest-validator | Graduated exploit validation | Per-vuln-type parallel | | 4. Report | qe-quality-gate | "No Exploit, No Report" filter | Sequential |

Graduated Exploitation Tiers

| Tier | Handler | Cost | Latency | Use When | |------|---------|------|---------|----------| | 1 | Agent Booster (WASM) | $0 | <1ms | Code pattern is conclusive (eval, innerHTML, hardcoded creds) | | 2 | Haiku | $0.0002 | ~500ms | Need payload test against live target | | 3 | Sonnet/Opus | $0.003-$0.015 | 2-5s | Full exploit chain with data proof |

When to Use This Skill

| Scenario | Tier | Estimated Cost | |----------|------|----------------| | PR security review (source only) | 1 | $0 | | Pre-release validation (staging) | 1-2 | $1-5 | | Full pentest validation | 1-3 | $5-15 | | Compliance audit evidence | 1-3 | $5-15 |


Configuration

pentest:
  target_url: https://staging.app.com    # REQUIRED for Tier 2-3
  source_repo: ./src                      # REQUIRED for Tier 1+
  exploitation_tier: 2                    # 1=pattern-only, 2=payload-test, 3=full-exploit
  vuln_types:                             # Which pipelines to run
    - injection                           # SQL, NoSQL, command injection
    - xss                                 # Reflected, stored, DOM XSS
    - auth                                # Auth bypass, session, JWT
    - ssrf                                # URL scheme abuse, metadata
  max_cost_usd: 15                        # Budget cap per run
  timeout_minutes: 30                     # Time cap per run
  require_authorization: true             # MUST confirm target ownership
  no_production: true                     # Block production URLs
  production_patterns:                    # URL patterns to block
    - "*.prod.*"
    - "api.*"
    - "www.*"

Safeguards (Mandatory)

Authorization Gate

Every pentest validation run MUST:

  1. Display target URL and exploitation tier to user
  2. Require explicit confirmation: "I own/authorized testing of this target"
  3. Log authorization with timestamp
  4. Block if target URL matches production patterns

What This Skill Does NOT Do

  • Full autonomous reconnaissance (Nmap, Subfinder)
  • Zero-day exploit development
  • Attack targets without explicit authorization
  • Test production systems
  • Store actual exfiltrated data (only proof of access)
  • Social engineering or phishing simulation
  • Port scanning or service discovery

Validation Pipelines

Injection Pipeline

| Attack | Tier 1 (Pattern) | Tier 2 (Payload) | Tier 3 (Full) | |--------|-------------------|-------------------|----------------| | SQL injection | String concat in query | ' OR '1'='1 response diff | UNION SELECT data extraction | | NoSQL injection | $where, $gt in query | Operator injection test | Collection enumeration | | Command injection | exec(), system() calls | Command delimiter test | Reverse shell proof | | LDAP injection | String concat in filter | Wildcard injection | Directory enumeration |

XSS Pipeline

| Attack | Tier 1 (Pattern) | Tier 2 (Payload) | Tier 3 (Full) | |--------|-------------------|-------------------|----------------| | Reflected XSS | No output encoding | <img onerror> reflection | Browser JS execution via Playwright | | Stored XSS | innerHTML assignment | Payload stored + retrieved | Cookie theft PoC | | DOM XSS | document.write(location) | Fragment injection | DOM manipulation proof |

Auth Pipeline

| Attack | Tier 1 (Pattern) | Tier 2 (Payload) | Tier 3 (Full) | |--------|-------------------|-------------------|----------------| | JWT none | No algorithm validation | Modified JWT accepted | Admin access with forged token | | Session fixation | No session rotation | Pre-set session reused | Cross-user session hijack | | Credential stuffing | No rate limiting | 100 attempts unblocked | Valid credential discovery | | IDOR | No authorization check | Access other user data | Full CRUD on foreign resources |

SSRF Pipeline

| Attack | Tier 1 (Pattern) | Tier 2 (Payload) | Tier 3 (Full) | |--------|-------------------|-------------------|----------------| | Internal URL | User-controlled URL fetch | http://169.254.169.254 | Cloud metadata extraction | | DNS rebinding | URL validation bypass | Rebind to internal IP | Internal service access | | Protocol smuggling | URL scheme not restricted | file:///etc/passwd | File content in response |


Agent Coordination

Orchestration Pattern

// Phase 1: Recon (parallel scans)
await Task("Security Scan", {
  target: "./src",
  layers: { sast: true, dast: true, dependencies: true, secrets: true }
}, "qe-security-scanner");

// Phase 2: Analysis (parallel review)
await Promise.all([
  Task("Code Security Review", {
    findings: phase1Results,
    depth: "comprehensive"
  }, "qe-security-reviewer"),

  Task("Compliance Audit", {
    findings: phase1Results,
    frameworks: ["owasp-top-10"]
  }, "qe-security-auditor")
]);

// Phase 3: Validation (graduated exploitation)
await Task("Exploit Validation", {
  findings: [...phase1Results, ...phase2Results],
  target_url: "https://staging.app.com",
  exploitation_tier: 2,
  vuln_types: ["injection", "xss", "auth", "ssrf"],
  max_cost_usd: 15,
  timeout_minutes: 30
}, "qe-pentest-validator");

// Phase 4: Report ("No Exploit, No Report" gate)
await Task("Security Quality Gate", {
  findings: phase3Results.confirmedFindings,
  gate: "no-exploit-no-report",
  require_poc: true
}, "qe-quality-gate");

Finding Classification

| Status | Meaning | Action | |--------|---------|--------| | confirmed-exploitable | Exploitation succeeded with PoC | Report with evidence | | likely-exploitable | Partial exploitation, defenses detected | Report with caveats | | not-exploitable | All exploitation attempts failed | Filter from report | | inconclusive | WAF/defense blocked, unclear if vulnerable | Report for manual review |


Exploit Playbook Memory

Namespace Structure

aqe/pentest/
 playbook/
  exploit/{vuln_type}/{tech_stack}/{technique}
  bypass/{defense_type}/{technique}
  payload/{vuln_type}/{variant}
 results/
  validation-{timestamp}
 poc/
  {finding_id}-poc

Learning Loop

  1. Before validation: Query playbook for known patterns matching findings
  2. During validation: Try known payloads first (higher success rate)
  3. After validation: Store new successful patterns with confidence scores
  4. Over time: Agent converges on most effective payloads per tech stack

Cost Optimization

Estimated Cost by Scenario

| Scenario | Tier Mix | Findings | Est. Cost | Est. Time | |----------|----------|----------|-----------|-----------| | PR check (source only) | 100% Tier 1 | 5 | $0 | <5s | | Sprint validation | 70% T1, 30% T2 | 15 | $2-5 | 5-10 min | | Release validation | 40% T1, 40% T2, 20% T3 | 25 | $8-15 | 15-30 min | | Full pentest | 20% T1, 30% T2, 50% T3 | 40 | $15-30 | 30-60 min |

Cost vs Shannon Comparison

| Metric | Shannon | AQE Pentest Validation | |--------|---------|----------------------| | Cost per run | ~$50 | $5-15 (graduated tiers) | | Runtime | 60-90 min | 15-30 min (parallel pipelines) | | False positive rate | Low (exploit-proven) | Low (same principle) | | Learning | None (static prompts) | ReasoningBank playbook |


Success Metrics

| Metric | Target | Measurement | |--------|--------|-------------| | False positive reduction | >60% of findings eliminated | Pre/post validator comparison | | Exploit confirmation rate | >80% of confirmed findings truly exploitable | Manual PoC verification | | Cost per run | <$15 USD | Token tracking per pipeline | | Time per run | <30 minutes | Execution time metrics | | Playbook growth | 100+ patterns after 6 months | Memory namespace count |


Related Skills


Remember

"No Exploit, No Report." A vulnerability scanner that can't prove exploitation delivers uncertain value. This skill transforms security findings from theoretical risks into proven vulnerabilities with evidence. Every confirmed finding comes with a reproducible proof-of-concept. Every false positive is eliminated before it reaches the report.

Think proof, not prediction. Don't report what MIGHT be vulnerable. Prove what IS vulnerable.