Agent Skills: Agent Browser

Headless browser automation CLI for AI agents. Covers commands, refs, sessions, snapshots, cloud providers, profiles. Keywords: agent-browser, browser automation, refs, snapshot.

UncategorizedID: itechmeat/llm-code/agent-browser

Install this agent skill to your local

pnpm dlx add-skill https://github.com/itechmeat/llm-code/tree/HEAD/skills/agent-browser

Skill Files

Browse the full folder contents for agent-browser.

Download Skill

Loading file tree…

skills/agent-browser/SKILL.md

Skill Metadata

Name
agent-browser
Description
"Headless browser automation CLI for AI agents. Covers commands, refs, sessions, snapshots, cloud providers, profiles. Keywords: agent-browser, browser automation, refs, snapshot."

Agent Browser

Headless browser automation CLI for AI agents. Fast Rust CLI with Node.js fallback.

Works with: Claude Code, Cursor, GitHub Copilot, OpenAI Codex, Google Gemini, opencode.

Quick Navigation

| Topic | Reference | | ------------ | --------------------------------------------- | | Installation | installation.md | | Commands | commands.md | | Refs | refs.md | | Advanced | advanced.md |

When to Use

  • Automating browser tasks in AI agent workflows
  • Web scraping with AI-friendly output
  • Testing web applications with LLM agents
  • Managing multiple browser sessions with isolated auth

Core Concepts

Refs (Element References)

The snapshot command returns an accessibility tree where each element has a unique ref like @e1, @e2:

  • Deterministic - ref points to exact element from snapshot
  • Fast - no DOM re-query needed
  • AI-friendly - LLMs can reliably parse and use refs

Architecture

Client-daemon architecture:

  1. Rust CLI - parses commands, communicates with daemon
  2. Node.js Daemon - manages Playwright browser instance

Daemon starts automatically and persists between commands.

v0.8.6 improves daemon reliability by cleaning stale socket/PID files and retrying transient connection errors.

Quick Example

# Navigate and get snapshot
agent-browser open example.com
agent-browser snapshot                    # Get accessibility tree with refs
agent-browser click @e2                   # Click by ref from snapshot
agent-browser fill @e3 "test@example.com" # Fill input by ref
agent-browser get text @e1                # Get text by ref
agent-browser screenshot page.png         # Save screenshot
agent-browser close

AI Workflow Pattern

Optimal workflow for AI agents:

# 1. Navigate and get snapshot
agent-browser open example.com
agent-browser snapshot -i --json   # AI parses tree and refs

# 2. AI identifies target refs from snapshot

# 3. Execute actions using refs
agent-browser click @e2
agent-browser fill @e3 "input text"

# 4. Get new snapshot if page changed
agent-browser snapshot -i --json

Headed Mode (Debugging)

agent-browser open example.com --headed

Local File Access (v0.9.1)

agent-browser open file:///path/to/doc.pdf --allow-file-access

Cursor-Aware Snapshots (v0.9.1)

agent-browser snapshot -C
agent-browser snapshot --cursor

Session Persistence (v0.10.0)

Automatically save and restore cookies/localStorage across restarts with a named session:

agent-browser --session-name myapp open myapp.com
agent-browser --session-name myapp open myapp.com

State management commands:

agent-browser state list
agent-browser state show myapp
agent-browser state rename myapp myapp-prod
agent-browser state clear myapp-prod
agent-browser state cleanup

Release Updates (v0.11.x–v0.12.0)

  • --annotate overlays numbered labels on interactive elements and prints a legend for multimodal reasoning.
  • Configuration file loading supports user/project scopes.
  • Command chaining with && is documented and recommended for daemon-backed multi-step runs.
  • Added profiling workflows and computed styles retrieval in advanced usage.
  • CDP connectivity and browser/device workflows were expanded.

New Tab Clicks (v0.10.0)

agent-browser click @e12 --new-tab

Mobile Safari (iOS)

agent-browser -p ios device list
agent-browser -p ios open https://example.com --device "iPhone 15"
agent-browser tap 200 400
agent-browser swipe 200 600 200 200 500

JSON Output

Use --json for machine-readable output:

agent-browser snapshot --json
agent-browser get text @e1 --json
agent-browser is visible @e2 --json

Critical Prohibitions

  • Do not use CSS/XPath selectors when refs are available (use @e1, @e2, etc.)
  • Do not forget to close sessions when done
  • Do not assume element positions without taking a fresh snapshot
  • Do not use old refs after page navigation or content changes (re-snapshot)

Common Commands

# Navigation
agent-browser open <url>
agent-browser back / forward / reload
agent-browser close

# Interaction
agent-browser click <sel>
agent-browser click <sel> --new-tab
agent-browser fill <sel> <text>
agent-browser press <key>
agent-browser hover <sel>
agent-browser select <sel> <val>
agent-browser download <sel> <path>  # v0.7+

# Info
agent-browser get text <sel>
agent-browser get url
agent-browser get title
agent-browser is visible <sel>

# Snapshots & Screenshots
agent-browser snapshot -i --json
agent-browser screenshot [path]

Links