Agent Skills: Expect — Diff-Aware AI Browser Testing

Diff-aware AI browser testing — analyzes git changes, generates targeted test plans, and executes them via agent-browser. Reads git diff to determine what changed, maps changes to affected pages via route map, generates a test plan scoped to the diff, and runs it with pass/fail reporting. Use when testing UI changes, verifying PRs before merge, running regression checks on changed components, or validating that recent code changes don't break the user-facing experience.

testingID: yonatangross/orchestkit/expect

Install this agent skill to your local

pnpm dlx add-skill https://github.com/yonatangross/orchestkit/tree/HEAD/src/skills/expect

Skill Files

Browse the full folder contents for expect.

Download Skill

Loading file tree…

src/skills/expect/SKILL.md

Skill Metadata

Name
expect
Description
"Diff-aware AI browser testing — analyzes git changes, generates targeted test plans, and executes them via agent-browser (Rust daemon + CDP, ARIA-tree-first). Reads git diff to determine what changed, maps changes to affected pages via route map, generates a test plan scoped to the diff, and runs it with pass/fail reporting. Use when testing UI changes, verifying PRs before merge, running regression checks on changed components, or validating that recent code changes don't break the user-facing experience."

Expect — Diff-Aware AI Browser Testing

Analyze git changes, generate targeted test plans, and execute them via AI-driven browser automation.

Note: If disableSkillShellExecution is enabled (CC 2.1.91), the agent-browser install check won't run. Verify it's installed: npx agent-browser --version.

/ork:expect                              # Auto-detect changes, test affected pages
/ork:expect -m "test the checkout flow"  # Specific instruction
/ork:expect --flow login                 # Replay a saved test flow
/ork:expect --target branch              # Test all changes on current branch vs main
/ork:expect -y                           # Skip plan review, run immediately

Core principle: Only test what changed. Git diff drives scope — no wasted cycles on unaffected pages.

Argument Resolution

ARGS = "[-m <instruction>] [--target unstaged|branch|commit] [--flow <slug>] [-y]"

# Parse from full argument string
import re
raw = ""  # Full argument string from CC

INSTRUCTION = None
TARGET = "unstaged"  # Default: test unstaged changes
FLOW = None
SKIP_REVIEW = False

# Extract -m "instruction"
m_match = re.search(r'-m\s+["\']([^"\']+)["\']|-m\s+(\S+)', raw)
if m_match:
    INSTRUCTION = m_match.group(1) or m_match.group(2)

# Extract --target
t_match = re.search(r'--target\s+(unstaged|branch|commit)', raw)
if t_match:
    TARGET = t_match.group(1)

# Extract --flow
f_match = re.search(r'--flow\s+(\S+)', raw)
if f_match:
    FLOW = f_match.group(1)

# Extract -y
if '-y' in raw.split():
    SKIP_REVIEW = True

STEP 0: MCP Probe + Prerequisite Check

# memory is alwaysLoad in .mcp.json (CC 2.1.121+, #1541) — probe below kept as fallback for older CC:
ToolSearch(query="select:mcp__memory__search_nodes")

# Verify agent-browser is available (Rust-native, no Playwright)
Bash("command -v agent-browser || npx agent-browser --version")
# If missing: "Install agent-browser: npm i -g agent-browser"

# Load agent-browser's own self-serving skill/workflow docs (required since 0.25.x)
Bash("agent-browser skills get agent-browser")

CRITICAL: Task Management

# 1. Create main task IMMEDIATELY
TaskCreate(
  subject="Expect: test changed code",
  description="Diff-aware browser testing pipeline",
  activeForm="Running diff-aware browser tests"
)

# 2. Create subtasks for each pipeline phase
TaskCreate(subject="Check fingerprint (skip if unchanged)", activeForm="Checking fingerprint")  # id=2
TaskCreate(subject="Scan git diff and classify changes", activeForm="Scanning diff")            # id=3
TaskCreate(subject="Map changes to routes/URLs", activeForm="Mapping routes")                   # id=4
TaskCreate(subject="Generate AI test plan", activeForm="Generating test plan")                   # id=5
TaskCreate(subject="Execute tests via agent-browser", activeForm="Executing browser tests")     # id=6
TaskCreate(subject="Compile test report", activeForm="Compiling report")                        # id=7

# 3. Set dependencies for sequential phases
TaskUpdate(taskId="3", addBlockedBy=["2"])  # Diff scan needs fingerprint check
TaskUpdate(taskId="4", addBlockedBy=["3"])  # Route map needs diff results
TaskUpdate(taskId="5", addBlockedBy=["4"])  # Test plan needs route map
TaskUpdate(taskId="6", addBlockedBy=["5"])  # Execution needs test plan
TaskUpdate(taskId="7", addBlockedBy=["6"])  # Report needs execution results

# 4. Before starting each task, verify it's unblocked
task = TaskGet(taskId="2")  # Verify blockedBy is empty

# 5. Update status as you progress
TaskUpdate(taskId="2", status="in_progress")  # When starting
TaskUpdate(taskId="2", status="completed")    # When done — repeat for each subtask

Pipeline Overview

Git Diff → Route Map → Fingerprint Check → Test Plan → Execute → Report

| Phase | What | Output | Reference | |-------|------|--------|-----------| | 1. Fingerprint | SHA-256 hash of changed files | Skip if unchanged since last run | references/fingerprint.md | | 2. Diff Scan | Parse git diff, classify changes | ChangesFor data (files, components, routes) | references/diff-scanner.md | | 3. Route Map | Map changed files to affected pages/URLs | Scoped page list | references/route-map.md | | 4. Test Plan | Generate AI test plan from diff + route map | Markdown test plan with steps | references/test-plan.md | | 5. Execute | Run test plan via agent-browser | Pass/fail per step, screenshots | references/execution.md | | 6. Report | Aggregate results, artifacts, exit code | Structured report + artifacts | references/report.md |

Phase 1: Fingerprint Check

Check if the current changes have already been tested:

Read(".expect/fingerprints.json")  # Previous run hashes
# Compare SHA-256 of changed files against stored fingerprints
# If match: "No changes since last test run. Use --force to re-run."
# If no match or --force: continue to Phase 2

Load: Read("${CLAUDE_SKILL_DIR}/references/fingerprint.md")

Phase 2: Diff Scan

Analyze git changes based on --target:

if TARGET == "unstaged":
    diff = Bash("git diff")
    files = Bash("git diff --name-only")
elif TARGET == "branch":
    diff = Bash("git diff main...HEAD")
    files = Bash("git diff main...HEAD --name-only")
elif TARGET == "commit":
    diff = Bash("git diff HEAD~1")
    files = Bash("git diff HEAD~1 --name-only")

Classify each changed file into 3 levels:

  1. Direct — the file itself changed
  2. Imported — a file that imports the changed file
  3. Routed — the page/route that renders the changed component

Load: Read("${CLAUDE_SKILL_DIR}/references/diff-scanner.md")

Phase 3: Route Map

Map changed files to testable URLs using .expect/config.yaml:

# .expect/config.yaml
base_url: http://localhost:3000
route_map:
  "src/components/Header.tsx": ["/", "/about", "/pricing"]
  "src/app/auth/**": ["/login", "/signup", "/forgot-password"]
  "src/app/dashboard/**": ["/dashboard"]

If no route map exists, infer from Next.js App Router / Pages Router conventions.

Load: Read("${CLAUDE_SKILL_DIR}/references/route-map.md")

Phase 4: Test Plan Generation

Build an AI test plan scoped to the diff, using the scope strategy for the current target:

scope_strategy = get_scope_strategy(TARGET)  # See references/scope-strategy.md

prompt = f"""
{scope_strategy}

Changes: {diff_summary}
Affected pages: {affected_urls}
Instruction: {INSTRUCTION or "Test that the changes work correctly"}

Generate a test plan with:
1. Page-level checks (loads, no console errors, correct content)
2. Interaction tests (forms, buttons, navigation affected by the diff)
3. Visual regression (compare ARIA snapshots if saved)
4. Accessibility (axe-core scan on affected pages)
"""

If --flow specified, load saved flow from .expect/flows/{slug}.yaml instead of generating.

If NOT --y, present plan to user via AskUserQuestion for review before executing.

Load: Read("${CLAUDE_SKILL_DIR}/references/test-plan.md")

Phase 5: Execution

agent-browser 0.25.x Quick Primer

| Area | Command | Notes | |------|---------|-------| | Snapshot | agent-browser snapshot -i | ARIA tree w/ @eN refs. -C/--cursor was removed in 0.22 | | Semantic locator | agent-browser find --role button "Continue" | Stable alternative to @eN refs | | Interaction | fill @e1 "...", click @e2, press Enter, drag @e1 @e2, upload @e1 file.pdf | All take ARIA refs | | Waits | wait --load networkidle, wait --text "Success", wait --fn "window.ready" | Event-driven, never sleep-based | | Network | network route "*analytics*" --abort, network route "https://api/*" --body '{...}' | Intercept + stub | | State | state save/load auth.json, --session-name <name> | Persist auth across runs | | Vault | vault store github_pat, vault load github_pat | Encrypted credential store | | Diff | diff snapshot, diff screenshot --baseline /tmp/x.png | ARIA + pixel diffing | | Capture | screenshot --annotate, pdf, record start/stop | Evidence artifacts | | Dashboard | agent-browser dashboard start (0.25+) | Browser-side runtime inspector on :4848 |

Run the test plan

expect_task = Agent(
  subagent_type="expect-agent",
  prompt=f"""Execute this test plan:
  {test_plan}

  For each step:
  1. Navigate to the URL
  2. Execute the test action
  3. Take a screenshot on failure
  4. Report PASS/FAIL with evidence
  """,
  run_in_background=True,
  model="sonnet",
  max_turns=50
)

# Stream agent-browser progress line-by-line instead of polling (CC 2.1.98+)
# Each stdout line from agent-browser arrives as a notification — useful for
# catching a failing step early rather than waiting for the full plan.
# Full pattern: Read("/Users/yonatangross/coding/yonatangross/orchestkit/plugins/ork/skills/chain-patterns/references/monitor-patterns.md")
Monitor(pid=expect_task.agent_id)

# For long test plans (>3 min typical), notify on completion — requires
# Remote Control + "Push when Claude decides" config (CC 2.1.110+).
# Skip silently if the user doesn't have Remote Control enabled.
if test_plan_duration_estimate > 180:
    PushNotification(
        title="ork:expect complete",
        body=f"{passed}/{total} steps passed on {len(affected_urls)} pages"
    )

Load: Read("${CLAUDE_SKILL_DIR}/references/execution.md")

Phase 6: Report

/ork:expect Report
═══════════════════════════════════════
Target: unstaged (3 files changed)
Pages tested: 4
Duration: 45s

Results:
  ✓ /login — form renders, submit works
  ✓ /signup — validation triggers on empty fields
  ✗ /dashboard — chart component crashes (TypeError)
  ✓ /settings — preferences save correctly

3 passed, 1 failed

Artifacts:
  .expect/reports/2026-03-26T16-30-00.json
  .expect/screenshots/dashboard-error.png

Load: Read("${CLAUDE_SKILL_DIR}/references/report.md")

Saved Flows

Reusable test sequences stored in .expect/flows/:

# .expect/flows/login.yaml
name: Login Flow
steps:
  - navigate: /login
  - fill: { selector: "#email", value: "test@example.com" }
  - fill: { selector: "#password", value: "password123" }
  - click: button[type="submit"]
  - assert: { url: "/dashboard" }
  - assert: { text: "Welcome back" }

Run with: /ork:expect --flow login

Auto-trigger after UI edits (M125 #2)

When the dev stack is live (/ork:dev), saving any .tsx, .jsx, .css, or .scss file (and Next.js route files like app/**/page.tsx, pages/**/*.tsx) emits a nudge to run /ork:expect <route>. The hook (posttool/ui-change-detector) is default-on and:

  • skips silently if /ork:dev hasn't booted (no agent-browser session to attach to);
  • enforces a 30-second cooldown per route to prevent spam on rapid saves;
  • honors .claude/state/expect-skip.<sessionId> as a per-session opt-out (write any content);
  • honors ORK_EXPECT_AUTO=0 for an env-level kill switch.

Route resolution: app/dashboard/page.tsx/dashboard, pages/settings.tsx/settings, component / global-style edits → / (home as proxy). Route groups like app/(marketing)/pricing/page.tsx strip to /pricing.

ARIA snapshot recording (M125 #6)

After a passing run, the posttool/expect/snapshot-recorder hook persists the captured ARIA tree to .claude/state/expect-snapshots/<route-slug>/<parent-commit>.json. Subsequent /ork:expect <route> --diff runs compare against the most recent prior snapshot for that route — surfaces structural regressions (added/removed buttons, label changes, hierarchy shifts) without needing a baseline screenshot.

For the snapshot recorder to fire, the expect run output must contain RUN_COMPLETED|passed, ROUTE|<route>, and ARIA|<json-summary> tags. The agent-browser-driven flow already emits these.

When NOT to Use

  • Unit tests — use /ork:cover instead
  • API-only changes — no browser UI to test
  • Generated files — skip build artifacts, lock files
  • Docs-only changes — unless you want to verify docs site rendering

Related Skills

  • agent-browser — Browser automation engine (required dependency)
  • ork:cover — Test suite generation (unit/integration/e2e)
  • ork:verify — Grade existing test quality
  • testing-e2e — Playwright patterns and best practices

References

Load on demand with Read("${CLAUDE_SKILL_DIR}/references/<file>"):

| File | Content | |------|---------| | fingerprint.md | SHA-256 gating logic | | diff-scanner.md | Git diff parsing + 3-level classification | | route-map.md | File-to-URL mapping conventions | | test-plan.md | AI test plan generation prompt templates | | execution.md | agent-browser orchestration patterns | | report.md | Report format + artifact storage | | config-schema.md | .expect/config.yaml full schema | | aria-diffing.md | ARIA snapshot comparison for semantic diffing | | scope-strategy.md | Test depth strategy per target mode | | saved-flows.md | Markdown+YAML flow format, adaptive replay | | rrweb-recording.md | rrweb DOM replay integration | | human-review.md | AskUserQuestion plan review gate | | ci-integration.md | GitHub Actions workflow + pre-push hooks | | research.md | millionco/expect architecture analysis |

Version: 1.0.0 (March 2026) — Initial scaffold, M99 milestone