BDD Test Solution Audit Skill

BDD Test Solution Audit

Goal: evaluate specification executability, flake resistance, maintainability, semantic/a11y quality, and AI-agent operability.

Adaptive Workflow

Workflow adapts based on repository size (auto-detected).

┌─────────────────────────────────────────────────────────────────┐
│ 1. DISCOVER → 2. ANALYZE → 3. SCORE → 4. REPORT → 5. ROADMAP   │
└─────────────────────────────────────────────────────────────────┘
     ↑                                                      │
     └──────────── Skip steps for small repos ──────────────┘

| Repo Size | Steps | Sampling | Questions | |-----------|-------|----------|-----------| | Small (≤20 scenarios) | 1→3→4 | None | 1 question | | Medium (21–100) | 1→2→3→4→5 | 30–50% | 2 questions | | Large (100+) | Full | Stratified | 3 questions |

Step 1: Discovery & Auto-Inference

Target: {argument OR cwd}

Auto-detect (no user input needed):

| What | How to Detect | |------|---------------| | Stack | playwright.config.* → Playwright; playwright-bdd in package.json → playwright-bdd | | Size | Count *.feature files and Scenario: lines | | History | Check .bddready/history/index.json exists | | CI | Check .github/workflows/, Jenkinsfile, .gitlab-ci.yml | | Artifacts | Check playwright.config.* for trace/video/screenshot settings |

Output immediately:

Target: {path}
Stack: {stack} (auto-detected)
Size: {small/medium/large} ({N} features, {M} scenarios)
History: {yes/no} | CI: {yes/no} | Artifacts: {configured/missing}

See modules/discovery.md for detailed detection rules.

Step 2: Sampling (Medium/Large repos only)

Skip for small repos — analyze all scenarios.

For medium/large repos, use stratified sampling. See modules/sampling.md.

Progress Indicator (Medium/Large repos)

For repositories with 50+ scenarios, show progress during analysis:

Analyzing BDD Test Solution...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0%

[■■■■■■■■■■░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 25%
✓ Discovery complete (playwright-bdd detected)
→ Analyzing features/auth/*.feature (8 scenarios)

[■■■■■■■■■■■■■■■■■■■■░░░░░░░░░░░░░░░░░░░░] 50%
→ Analyzing features/checkout/*.feature (12 scenarios)

[■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■░░░░░░░░░░] 75%
→ Scoring aspects...

[■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] 100%
✓ Analysis complete

Progress stages:

Discovery (10%)
Feature file analysis (10-70%, proportional to file count)
Step definition analysis (70-85%)
Scoring (85-95%)
Report generation (95-100%)

Update progress after each feature file or major step.

Step 3: Score Aspects

Score each aspect using rubrics from criteria/aspects.md.

Aspects and weights:

| # | Aspect | Weight | |---|--------|--------| | 1 | Executable Gherkin | 16% | | 2 | Step Definitions Quality | 14% | | 3 | Test Architecture | 14% | | 4 | Selector Strategy | 12% | | 5 | Waiting & Flake Resistance | 14% | | 6 | Data & Environment | 10% | | 7 | CI, Reporting & Artifacts | 10% | | 8 | AI-Agent Operability | 10% |

Scoring: 0 (bad) / 5 (partial) / 10 (good) per criterion.

See modules/scoring.md for calculation formulas.

Step 4: Report

4.1 Terminal Output (Always)

Print ASCII dashboard with scores and issues. See modules/output-formats.md.

4.2 Issues by Severity

Classify using reference/severity.md:

🔴 CRITICAL — blocks reliable execution
🟡 WARNING — hinders speed/maintainability
🔵 INFO — optimizations

Every issue MUST have:

Evidence (file path, pattern, or code snippet)
Impact (why it matters)
Effort estimate (Low/Medium/High)

4.3 Save Reports

Save to .bddready/history/reports/:

{REPORT_ID}.json — machine-readable
{REPORT_ID}.md — human-readable

Update .bddready/history/index.json for delta tracking.

4.4 HTML Report (Offer to User)

After showing terminal output, ask:

Would you like me to generate an interactive HTML report?

If yes, run:

node scripts/render-html.mjs .bddready/history/reports/{REPORT_ID}.json .bddready/history/reports/{REPORT_ID}.html

Interactive Fix Mode

After showing issues, offer to fix quick wins immediately.

Trigger Conditions

Offer interactive fixes when:

At least 1 CRITICAL issue with Effort: Low
Issue has clear, automatable fix pattern

Flow

╔══════════════════════════════════════════════════════════════════╗
║                     QUICK FIX AVAILABLE                          ║
╠══════════════════════════════════════════════════════════════════╣
║  [C1] Flake Resistance: Found 7 arbitrary sleeps                 ║
║       Fix: Replace `wait X seconds` with condition waits         ║
║       Effort: Low | Files: 3                                     ║
║                                                                  ║
║  → Fix C1 now? [y/n/skip all]                                    ║
╚══════════════════════════════════════════════════════════════════╝

Response Handling

| Response | Action | |----------|--------| | y / yes | Apply fix, show diff, continue to next fixable issue | | n / no | Skip this issue, continue to next | | skip all / s | Skip interactive mode, show full report |

Fixable Patterns

| Issue Pattern | Auto-Fix | |---------------|----------| | wait X seconds without condition | → waitFor with visibility/enabled check | | Hardcoded sleep() | → waitForSelector() or waitForResponse() | | CSS class selectors | → getByRole() / getByTestId() (suggest, confirm) | | Missing trace: 'on-first-retry' | → Add to playwright.config | | Duplicate step definitions | → Consolidate (show which to keep) |

After Each Fix

✓ Fixed C1: Replaced 7 sleeps with condition waits
  Modified: features/checkout.feature, features/auth.feature
  
→ Fix C2 now? [y/n/skip all]

Post-Fix Summary

╔══════════════════════════════════════════════════════════════════╗
║                     FIX SUMMARY                                  ║
╠══════════════════════════════════════════════════════════════════╣
║  ✓ C1: Fixed (7 sleeps → condition waits)                        ║
║  ✓ C3: Fixed (added trace-on-failure)                            ║
║  ✗ C2: Skipped (requires manual review)                          ║
║                                                                  ║
║  Files modified: 5                                               ║
║  New score estimate: 68 → 74 (+6)                                ║
╚══════════════════════════════════════════════════════════════════╝

Step 5: Roadmap (Medium/Large repos only)

Skip for small repos — provide inline recommendations instead.

| Phase | Focus | |-------|-------| | 1: Quick Wins | Remove sleeps, enable trace-on-failure, fix critical selectors | | 2: Foundation | Thin step defs, proper fixtures, test isolation | | 3: Advanced | Visual tests, a11y integration, CI optimization |

User Questions

Auto-Inference First

Before asking, try to infer from codebase:

| Question | Auto-Inference | |----------|----------------| | Primary goal? | Infer from issues: many sleeps → stability; bad selectors → AI-ready | | Depth of changes? | Infer from repo size: small → quick wins; large → phased | | CI constraints? | Read from config: worker count, timeout settings |

Minimal Question Set

Ask ONLY what cannot be inferred:

Small repos (1 question):

What is your priority: stability, speed, or AI-agent readability?

Medium repos (2 questions):

What is your priority: stability, speed, or AI-agent readability?
How deep can changes go: quick fixes only, or can we refactor?

Large repos (3 questions):

What is your priority: stability, speed, or AI-agent readability?
How deep can changes go: quick fixes only, medium refactor, or deep restructuring?
Are there CI/environment constraints? (e.g., worker limits, no mocks, staging only)

Dynamic Questions (Only if triggered)

Ask ONLY if specific issues found:

| Trigger | Question | |---------|----------| | CRITICAL issues found | Which CRITICAL items should be fixed first? (list by ID) | | Selector/a11y issues | Can we modify application markup (HTML), or tests only? | | >10 WARNING issues | Which WARNING items are in scope this iteration? |

Semantic/A11y Refactoring Proposal

If Aspect 4 (Selector Strategy) or Aspect 8 (AI-Agent Operability) scores below 60, propose:

╔══════════════════════════════════════════════════════════════════╗
║           SEMANTIC/A11Y REFACTORING PROPOSAL                     ║
╠══════════════════════════════════════════════════════════════════╣
║  Your locators would be more stable with semantic HTML.          ║
║                                                                  ║
║  Would you like me to help refactor:                             ║
║  [ ] Component markup (replace div onclick → button, add ARIA)   ║
║  [ ] Test locators (migrate CSS → getByRole)                     ║
╚══════════════════════════════════════════════════════════════════╝

Ask only if user has access to modify application source code.

Reference Files

| File | Purpose | |------|---------| | criteria/aspects.md | Detailed scoring rubrics (0/5/10) | | reference/severity.md | Issue classification rules | | reference/bdd-best-practices.md | Best practices guide | | modules/discovery.md | Discovery details | | modules/sampling.md | Sampling strategy | | modules/scoring.md | Score calculation | | modules/output-formats.md | Output format specs | | templates/report.html | HTML report template | | scripts/render-html.mjs | HTML generator script |

Quick Reference: Workflow by Size

Small Repo (≤20 scenarios)

Discover (auto-detect stack, size)
Ask 1 question (priority)
Analyze all scenarios
Score aspects (simplified)
Print terminal report + issues
Interactive fix mode (if Low-effort CRITICAL issues)
Offer HTML report
Provide inline recommendations

Medium Repo (21–100 scenarios)

Discover (auto-detect)
Ask 2 questions
Sample 30–50%
Show progress (50+ scenarios)
Full aspect scoring
Terminal + saved reports
Interactive fix mode
Offer HTML report
Phased roadmap (3 phases)

Large Repo (100+ scenarios)

Discover (auto-detect)
Ask 3 questions
Stratified sampling
Show progress (with stage updates)
Full aspect scoring
All report formats
Interactive fix mode
HTML report
Detailed phased roadmap
Propose a11y refactoring if applicable

Agent Skills: BDD Test Solution Audit

Install this agent skill to your local

Skill Files