BDD Test Solution Audit
Goal: evaluate specification executability, flake resistance, maintainability, semantic/a11y quality, and AI-agent operability.
Adaptive Workflow
Workflow adapts based on repository size (auto-detected).
┌─────────────────────────────────────────────────────────────────┐
│ 1. DISCOVER → 2. ANALYZE → 3. SCORE → 4. REPORT → 5. ROADMAP │
└─────────────────────────────────────────────────────────────────┘
↑ │
└──────────── Skip steps for small repos ──────────────┘
| Repo Size | Steps | Sampling | Questions | |-----------|-------|----------|-----------| | Small (≤20 scenarios) | 1→3→4 | None | 1 question | | Medium (21–100) | 1→2→3→4→5 | 30–50% | 2 questions | | Large (100+) | Full | Stratified | 3 questions |
Step 1: Discovery & Auto-Inference
Target: {argument OR cwd}
Auto-detect (no user input needed):
| What | How to Detect |
|------|---------------|
| Stack | playwright.config.* → Playwright; playwright-bdd in package.json → playwright-bdd |
| Size | Count *.feature files and Scenario: lines |
| History | Check .bddready/history/index.json exists |
| CI | Check .github/workflows/, Jenkinsfile, .gitlab-ci.yml |
| Artifacts | Check playwright.config.* for trace/video/screenshot settings |
Output immediately:
Target: {path}
Stack: {stack} (auto-detected)
Size: {small/medium/large} ({N} features, {M} scenarios)
History: {yes/no} | CI: {yes/no} | Artifacts: {configured/missing}
See modules/discovery.md for detailed detection rules.
Step 2: Sampling (Medium/Large repos only)
Skip for small repos — analyze all scenarios.
For medium/large repos, use stratified sampling. See modules/sampling.md.
Progress Indicator (Medium/Large repos)
For repositories with 50+ scenarios, show progress during analysis:
Analyzing BDD Test Solution...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0%
[■■■■■■■■■■░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 25%
✓ Discovery complete (playwright-bdd detected)
→ Analyzing features/auth/*.feature (8 scenarios)
[■■■■■■■■■■■■■■■■■■■■░░░░░░░░░░░░░░░░░░░░] 50%
→ Analyzing features/checkout/*.feature (12 scenarios)
[■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■░░░░░░░░░░] 75%
→ Scoring aspects...
[■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] 100%
✓ Analysis complete
Progress stages:
- Discovery (10%)
- Feature file analysis (10-70%, proportional to file count)
- Step definition analysis (70-85%)
- Scoring (85-95%)
- Report generation (95-100%)
Update progress after each feature file or major step.
Step 3: Score Aspects
Score each aspect using rubrics from criteria/aspects.md.
Aspects and weights:
| # | Aspect | Weight | |---|--------|--------| | 1 | Executable Gherkin | 16% | | 2 | Step Definitions Quality | 14% | | 3 | Test Architecture | 14% | | 4 | Selector Strategy | 12% | | 5 | Waiting & Flake Resistance | 14% | | 6 | Data & Environment | 10% | | 7 | CI, Reporting & Artifacts | 10% | | 8 | AI-Agent Operability | 10% |
Scoring: 0 (bad) / 5 (partial) / 10 (good) per criterion.
See modules/scoring.md for calculation formulas.
Step 4: Report
4.1 Terminal Output (Always)
Print ASCII dashboard with scores and issues. See modules/output-formats.md.
4.2 Issues by Severity
Classify using reference/severity.md:
- 🔴 CRITICAL — blocks reliable execution
- 🟡 WARNING — hinders speed/maintainability
- 🔵 INFO — optimizations
Every issue MUST have:
- Evidence (file path, pattern, or code snippet)
- Impact (why it matters)
- Effort estimate (Low/Medium/High)
4.3 Save Reports
Save to .bddready/history/reports/:
{REPORT_ID}.json— machine-readable{REPORT_ID}.md— human-readable
Update .bddready/history/index.json for delta tracking.
4.4 HTML Report (Offer to User)
After showing terminal output, ask:
Would you like me to generate an interactive HTML report?
If yes, run:
node scripts/render-html.mjs .bddready/history/reports/{REPORT_ID}.json .bddready/history/reports/{REPORT_ID}.html
Interactive Fix Mode
After showing issues, offer to fix quick wins immediately.
Trigger Conditions
Offer interactive fixes when:
- At least 1 CRITICAL issue with
Effort: Low - Issue has clear, automatable fix pattern
Flow
╔══════════════════════════════════════════════════════════════════╗
║ QUICK FIX AVAILABLE ║
╠══════════════════════════════════════════════════════════════════╣
║ [C1] Flake Resistance: Found 7 arbitrary sleeps ║
║ Fix: Replace `wait X seconds` with condition waits ║
║ Effort: Low | Files: 3 ║
║ ║
║ → Fix C1 now? [y/n/skip all] ║
╚══════════════════════════════════════════════════════════════════╝
Response Handling
| Response | Action |
|----------|--------|
| y / yes | Apply fix, show diff, continue to next fixable issue |
| n / no | Skip this issue, continue to next |
| skip all / s | Skip interactive mode, show full report |
Fixable Patterns
| Issue Pattern | Auto-Fix |
|---------------|----------|
| wait X seconds without condition | → waitFor with visibility/enabled check |
| Hardcoded sleep() | → waitForSelector() or waitForResponse() |
| CSS class selectors | → getByRole() / getByTestId() (suggest, confirm) |
| Missing trace: 'on-first-retry' | → Add to playwright.config |
| Duplicate step definitions | → Consolidate (show which to keep) |
After Each Fix
✓ Fixed C1: Replaced 7 sleeps with condition waits
Modified: features/checkout.feature, features/auth.feature
→ Fix C2 now? [y/n/skip all]
Post-Fix Summary
╔══════════════════════════════════════════════════════════════════╗
║ FIX SUMMARY ║
╠══════════════════════════════════════════════════════════════════╣
║ ✓ C1: Fixed (7 sleeps → condition waits) ║
║ ✓ C3: Fixed (added trace-on-failure) ║
║ ✗ C2: Skipped (requires manual review) ║
║ ║
║ Files modified: 5 ║
║ New score estimate: 68 → 74 (+6) ║
╚══════════════════════════════════════════════════════════════════╝
Step 5: Roadmap (Medium/Large repos only)
Skip for small repos — provide inline recommendations instead.
| Phase | Focus | |-------|-------| | 1: Quick Wins | Remove sleeps, enable trace-on-failure, fix critical selectors | | 2: Foundation | Thin step defs, proper fixtures, test isolation | | 3: Advanced | Visual tests, a11y integration, CI optimization |
User Questions
Auto-Inference First
Before asking, try to infer from codebase:
| Question | Auto-Inference | |----------|----------------| | Primary goal? | Infer from issues: many sleeps → stability; bad selectors → AI-ready | | Depth of changes? | Infer from repo size: small → quick wins; large → phased | | CI constraints? | Read from config: worker count, timeout settings |
Minimal Question Set
Ask ONLY what cannot be inferred:
Small repos (1 question):
What is your priority: stability, speed, or AI-agent readability?
Medium repos (2 questions):
- What is your priority: stability, speed, or AI-agent readability?
- How deep can changes go: quick fixes only, or can we refactor?
Large repos (3 questions):
- What is your priority: stability, speed, or AI-agent readability?
- How deep can changes go: quick fixes only, medium refactor, or deep restructuring?
- Are there CI/environment constraints? (e.g., worker limits, no mocks, staging only)
Dynamic Questions (Only if triggered)
Ask ONLY if specific issues found:
| Trigger | Question | |---------|----------| | CRITICAL issues found | Which CRITICAL items should be fixed first? (list by ID) | | Selector/a11y issues | Can we modify application markup (HTML), or tests only? | | >10 WARNING issues | Which WARNING items are in scope this iteration? |
Semantic/A11y Refactoring Proposal
If Aspect 4 (Selector Strategy) or Aspect 8 (AI-Agent Operability) scores below 60, propose:
╔══════════════════════════════════════════════════════════════════╗
║ SEMANTIC/A11Y REFACTORING PROPOSAL ║
╠══════════════════════════════════════════════════════════════════╣
║ Your locators would be more stable with semantic HTML. ║
║ ║
║ Would you like me to help refactor: ║
║ [ ] Component markup (replace div onclick → button, add ARIA) ║
║ [ ] Test locators (migrate CSS → getByRole) ║
╚══════════════════════════════════════════════════════════════════╝
Ask only if user has access to modify application source code.
Reference Files
| File | Purpose |
|------|---------|
| criteria/aspects.md | Detailed scoring rubrics (0/5/10) |
| reference/severity.md | Issue classification rules |
| reference/bdd-best-practices.md | Best practices guide |
| modules/discovery.md | Discovery details |
| modules/sampling.md | Sampling strategy |
| modules/scoring.md | Score calculation |
| modules/output-formats.md | Output format specs |
| templates/report.html | HTML report template |
| scripts/render-html.mjs | HTML generator script |
Quick Reference: Workflow by Size
Small Repo (≤20 scenarios)
- Discover (auto-detect stack, size)
- Ask 1 question (priority)
- Analyze all scenarios
- Score aspects (simplified)
- Print terminal report + issues
- Interactive fix mode (if Low-effort CRITICAL issues)
- Offer HTML report
- Provide inline recommendations
Medium Repo (21–100 scenarios)
- Discover (auto-detect)
- Ask 2 questions
- Sample 30–50%
- Show progress (50+ scenarios)
- Full aspect scoring
- Terminal + saved reports
- Interactive fix mode
- Offer HTML report
- Phased roadmap (3 phases)
Large Repo (100+ scenarios)
- Discover (auto-detect)
- Ask 3 questions
- Stratified sampling
- Show progress (with stage updates)
- Full aspect scoring
- All report formats
- Interactive fix mode
- HTML report
- Detailed phased roadmap
- Propose a11y refactoring if applicable