Browser Automation Skill Skill

Browser Automation Skill

Web browser automation using agent-browser with AI-optimized snapshots. Reduces context by 93% using element refs (@e1, @e2) instead of full DOM.

Core Workflow

# 1. Navigate to page
agent-browser open <url>

# 2. Get accessibility tree with element refs
agent-browser snapshot -i    # -i = interactive elements only

# 3. Interact using refs from snapshot
agent-browser click @e2
agent-browser fill @e3 "text"

# 4. Re-snapshot after page changes
agent-browser snapshot -i

Quick Reference

Navigation

| Command | Description | |---------|-------------| | open <url> | Navigate to URL | | back | Go back | | forward | Go forward | | reload | Reload page | | close | Close browser |

Snapshots (AI-Optimized)

| Command | Description | |---------|-------------| | snapshot | Full accessibility tree | | snapshot -i | Interactive elements only (buttons, links, inputs) | | snapshot -c | Compact (remove empty elements) | | snapshot -d 3 | Limit depth to 3 levels | | screenshot [path] | Capture screenshot (base64 if no path) |

Interaction

| Command | Description | |---------|-------------| | click <sel> | Click element | | fill <sel> <text> | Clear and fill input | | type <sel> <text> | Type with key events | | press <key> | Press key (Enter, Tab, etc.) | | hover <sel> | Hover element | | select <sel> <val> | Select dropdown option | | check/uncheck <sel> | Toggle checkbox | | scroll <dir> [px] | Scroll page |

Get Info

| Command | Description | |---------|-------------| | get text <sel> | Get text content | | get html <sel> | Get innerHTML | | get value <sel> | Get input value | | get attr <sel> <attr> | Get attribute | | get title | Get page title | | get url | Get current URL |

Wait

| Command | Description | |---------|-------------| | wait <selector> | Wait for element | | wait <ms> | Wait milliseconds | | wait --text "text" | Wait for text | | wait --url "pattern" | Wait for URL | | wait --load networkidle | Wait for load state |

Sessions

| Command | Description | |---------|-------------| | --session <name> | Use isolated session | | session list | List active sessions |

Selectors

Element Refs (Recommended)

# Get refs from snapshot
agent-browser snapshot -i
# Output: button "Submit" [ref=e2]

# Use ref to interact
agent-browser click @e2

CSS Selectors

agent-browser click "#submit"
agent-browser fill ".email-input" "test@test.com"

Semantic Locators

agent-browser find role button click --name "Submit"
agent-browser find label "Email" fill "test@test.com"
agent-browser find testid "login-btn" click

Examples

Login Flow

agent-browser open https://example.com/login
agent-browser snapshot -i
agent-browser fill @e2 "user@example.com"
agent-browser fill @e3 "password123"
agent-browser click @e4
agent-browser wait --url "**/dashboard"

Form Submission

agent-browser open https://example.com/contact
agent-browser snapshot -i
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "john@example.com"
agent-browser fill @e3 "Hello, this is my message"
agent-browser click @e4
agent-browser wait --text "Thank you"

Data Extraction

agent-browser open https://example.com/products
agent-browser snapshot -i
# Iterate through product refs
agent-browser get text @e1  # Product name
agent-browser get text @e2  # Price
agent-browser get attr @e3 href  # Link

Multi-Session (Swarm)

# Session 1: Navigator
agent-browser --session nav open https://example.com
agent-browser --session nav state save auth.json

# Session 2: Scraper (uses same auth)
agent-browser --session scrape state load auth.json
agent-browser --session scrape open https://example.com/data
agent-browser --session scrape snapshot -i

Integration with Claude Flow

MCP Tools

All browser operations are available as MCP tools with browser/ prefix:

browser/open
browser/snapshot
browser/click
browser/fill
browser/screenshot
etc.

Memory Integration

# Store successful patterns
npx @claude-flow/cli memory store --namespace browser-patterns --key "login-flow" --value "snapshot->fill->click->wait"

# Retrieve before similar task
npx @claude-flow/cli memory search --query "login automation"

Hooks

# Pre-browse hook (get context)
npx @claude-flow/cli hooks pre-edit --file "browser-task.ts"

# Post-browse hook (record success)
npx @claude-flow/cli hooks post-task --task-id "browse-1" --success true

Tips

Always use snapshots - They're optimized for AI with refs
Prefer -i flag - Gets only interactive elements, smaller output
Use refs, not selectors - More reliable, deterministic
Re-snapshot after navigation - Page state changes
Use sessions for parallel work - Each session is isolated

Agent Skills: Browser Automation Skill

Install this agent skill to your local

Skill Files