Agent Browser Skill | Agent Skills

Agent Browser

Headless browser automation CLI for AI agents. Fast Rust CLI with Node.js fallback.

Works with: Claude Code, Cursor, GitHub Copilot, OpenAI Codex, Google Gemini, opencode.

Quick Navigation

| Topic | Reference | | ------------ | --------------------------------------------- | | Installation | installation.md | | Commands | commands.md | | Refs | refs.md | | Advanced | advanced.md |

When to Use

Automating browser tasks in AI agent workflows
Web scraping with AI-friendly output
Testing web applications with LLM agents
Managing multiple browser sessions with isolated auth

Core Concepts

Refs (Element References)

The snapshot command returns an accessibility tree where each element has a unique ref like @e1, @e2:

Deterministic - ref points to exact element from snapshot
Fast - no DOM re-query needed
AI-friendly - LLMs can reliably parse and use refs

Architecture

Client-daemon architecture:

Rust CLI - parses commands, communicates with daemon
Node.js Daemon - manages Playwright browser instance

Daemon starts automatically and persists between commands.

v0.8.6 improves daemon reliability by cleaning stale socket/PID files and retrying transient connection errors.

Quick Example

# Navigate and get snapshot
agent-browser open example.com
agent-browser snapshot                    # Get accessibility tree with refs
agent-browser click @e2                   # Click by ref from snapshot
agent-browser fill @e3 "test@example.com" # Fill input by ref
agent-browser get text @e1                # Get text by ref
agent-browser screenshot page.png         # Save screenshot
agent-browser close

AI Workflow Pattern

Optimal workflow for AI agents:

# 1. Navigate and get snapshot
agent-browser open example.com
agent-browser snapshot -i --json   # AI parses tree and refs

# 2. AI identifies target refs from snapshot

# 3. Execute actions using refs
agent-browser click @e2
agent-browser fill @e3 "input text"

# 4. Get new snapshot if page changed
agent-browser snapshot -i --json

Headed Mode (Debugging)

agent-browser open example.com --headed

Local File Access (v0.9.1)

agent-browser open file:///path/to/doc.pdf --allow-file-access

Cursor-Aware Snapshots (v0.9.1)

agent-browser snapshot -C
agent-browser snapshot --cursor

Session Persistence (v0.10.0)

Automatically save and restore cookies/localStorage across restarts with a named session:

agent-browser --session-name myapp open myapp.com
agent-browser --session-name myapp open myapp.com

State management commands:

agent-browser state list
agent-browser state show myapp
agent-browser state rename myapp myapp-prod
agent-browser state clear myapp-prod
agent-browser state cleanup

Release Updates (v0.11.x–v0.12.0)

--annotate overlays numbered labels on interactive elements and prints a legend for multimodal reasoning.
Configuration file loading supports user/project scopes.
Command chaining with && is documented and recommended for daemon-backed multi-step runs.
Added profiling workflows and computed styles retrieval in advanced usage.
CDP connectivity and browser/device workflows were expanded.

New Tab Clicks (v0.10.0)

agent-browser click @e12 --new-tab

Mobile Safari (iOS)

agent-browser -p ios device list
agent-browser -p ios open https://example.com --device "iPhone 15"
agent-browser tap 200 400
agent-browser swipe 200 600 200 200 500

JSON Output

Use --json for machine-readable output:

agent-browser snapshot --json
agent-browser get text @e1 --json
agent-browser is visible @e2 --json

Critical Prohibitions

Do not use CSS/XPath selectors when refs are available (use @e1, @e2, etc.)
Do not forget to close sessions when done
Do not assume element positions without taking a fresh snapshot
Do not use old refs after page navigation or content changes (re-snapshot)

Common Commands

# Navigation
agent-browser open <url>
agent-browser back / forward / reload
agent-browser close

# Interaction
agent-browser click <sel>
agent-browser click <sel> --new-tab
agent-browser fill <sel> <text>
agent-browser press <key>
agent-browser hover <sel>
agent-browser select <sel> <val>
agent-browser download <sel> <path>  # v0.7+

# Info
agent-browser get text <sel>
agent-browser get url
agent-browser get title
agent-browser is visible <sel>

# Snapshots & Screenshots
agent-browser snapshot -i --json
agent-browser screenshot [path]

Agent Skills: Agent Browser

Install this agent skill to your local

Skill Files