Web Browser Skill (agent-browser)
Use agent-browser for web automation. It runs a headless Chromium instance by default and exposes a CLI optimized for AI agents.
Full command reference:
agent-browser --help
Installation
npm install -g agent-browser
agent-browser install # Download Chromium
# Linux only:
agent-browser install --with-deps # Install system deps
Core Workflow (recommended)
- Open a page
agent-browser open https://example.com - Get a snapshot (refs)
agent-browser snapshot -i # Interactive elements only # or JSON for machine parsing agent-browser snapshot -i --json - Interact using refs
agent-browser click @e2 agent-browser fill @e3 "test@example.com" agent-browser get text @e1 - Re-snapshot after changes
agent-browser snapshot -i --json
Refs (@e1, @e2, …) are deterministic and ideal for AI workflows.
Common Commands
agent-browser open <url> # Navigate (alias: goto)
agent-browser snapshot # Accessibility tree with refs
agent-browser click <sel|@ref>
agent-browser fill <sel|@ref> <text>
agent-browser type <sel|@ref> <text>
agent-browser press <key> # e.g. Enter, Tab, Control+a
agent-browser get text <sel|@ref>
agent-browser screenshot [path] # Use --full for full page
agent-browser close # Close browser
Semantic Finders (optional)
agent-browser find role button click --name "Submit"
agent-browser find label "Email" fill "test@test.com"
Helpful Options
- Headed mode (visible browser):
agent-browser open https://example.com --headed - Persistent profile (cookies/logins):
agent-browser --profile ~/.myapp-profile open https://example.com - Isolated sessions:
agent-browser --session agent1 open https://example.com - Agent-friendly JSON output:
agent-browser snapshot -i --json agent-browser get text @e1 --json - Local files (file://):
agent-browser --allow-file-access open file:///path/to/page.html
When to Use
Use this skill whenever the agent needs to browse the web, inspect pages, click buttons, fill forms, or capture screenshots.