agent-browser CLI
A headless browser automation CLI designed for AI agents, with fast Rust-based execution and element refs optimized for LLM reasoning.
Overview
agent-browser provides programmatic browser control through a CLI that's purpose-built for AI agent workflows. It uses deterministic element references (@e1, @e2, etc.) from accessibility trees, making it ideal for LLM-based automation where consistent element targeting is critical.
Key Features:
- Fast Rust-based CLI with Node.js fallback
- Element refs (
@e1,@e2) for stable LLM reasoning - Session isolation for parallel browser instances
- JSON output for programmatic parsing
- Accessibility tree snapshots for element discovery
- CDP connection support for existing browser instances
When to Use
- Automating web interactions (form filling, clicking, navigation)
- Scraping web content with accessibility tree parsing
- Testing web applications programmatically
- Multi-agent scenarios requiring isolated browser sessions
- Screenshot capture and PDF generation
- Monitoring web page state changes
Prerequisites
- Node.js >= 18 or Bun runtime
- Chromium (installed via
agent-browser install)
Installation
# Install globally
bun install -g agent-browser
# Download Chromium
agent-browser install
# Linux with system dependencies
agent-browser install --with-deps
Verify installation:
agent-browser --version
Quick Start
# Navigate to a page
agent-browser open https://example.com
# Get accessibility snapshot with element refs
agent-browser snapshot -i # -i = interactive elements only
# Click an element by ref
agent-browser click @e2
# Fill a form field
agent-browser fill @e3 "test@example.com"
# Take a screenshot
agent-browser screenshot output.png
Core Workflow
1. Open Page and Snapshot
# Open URL
agent-browser open https://example.com
# Get interactive elements with refs
agent-browser snapshot -i --json
The snapshot returns elements like:
@e1 link "Home"
@e2 textbox "Email"
@e3 button "Submit"
2. Interact Using Refs
# Click by ref
agent-browser click @e2
# Type into focused element
agent-browser type @e2 "hello@example.com"
# Fill (clears first, then types)
agent-browser fill @e2 "hello@example.com"
# Press keyboard key
agent-browser press Enter
3. Verify State
# Check element visibility
agent-browser is visible @e3
# Get element text
agent-browser get text @e1
# Get page URL
agent-browser get url
Command Reference
Navigation
| Command | Description |
|---------|-------------|
| open <url> | Navigate to URL |
| back | Go back in history |
| forward | Go forward in history |
| reload | Reload current page |
| close | Close browser |
Interactions
| Command | Description |
|---------|-------------|
| click <ref> | Click element |
| dblclick <ref> | Double-click element |
| type <ref> <text> | Type text into element |
| fill <ref> <text> | Clear and fill element |
| press <key> | Press keyboard key (Enter, Tab, Control+a) |
| hover <ref> | Hover over element |
| focus <ref> | Focus element |
| check <ref> | Check checkbox |
| uncheck <ref> | Uncheck checkbox |
| select <ref> <val> | Select dropdown option |
| scroll <dir> [px] | Scroll (up/down/left/right) |
| wait <ref\|ms> | Wait for element or milliseconds |
Information
| Command | Description |
|---------|-------------|
| snapshot | Get accessibility tree with refs |
| snapshot -i | Interactive elements only |
| snapshot -c | Compact (remove empty elements) |
| snapshot -d <n> | Limit tree depth |
| get text <ref> | Get element text content |
| get html <ref> | Get element HTML |
| get value <ref> | Get input value |
| get url | Get current page URL |
| get title | Get page title |
| screenshot [path] | Take screenshot |
| screenshot --full | Full page screenshot |
| pdf <path> | Save page as PDF |
State Checks
| Command | Description |
|---------|-------------|
| is visible <ref> | Check if element is visible |
| is enabled <ref> | Check if element is enabled |
| is checked <ref> | Check if checkbox is checked |
Find Elements
# Find by role and click
agent-browser find role button click --name Submit
# Find by text
agent-browser find text "Sign In" click
# Find by label
agent-browser find label "Email" fill "test@example.com"
# Find by placeholder
agent-browser find placeholder "Search..." type "query"
Session Management
Sessions provide isolated browser instances for parallel execution:
# Use named session
agent-browser --session login-flow open https://app.com
# Different session for another task
agent-browser --session checkout open https://app.com/cart
# List active sessions
agent-browser session list
# Environment variable (persistent across commands)
export AGENT_BROWSER_SESSION=my-session
agent-browser open https://example.com
JSON Output
Use --json for machine-readable output:
# Snapshot as JSON
agent-browser snapshot -i --json
# Get element info as JSON
agent-browser get text @e1 --json
# Parse in scripts
agent-browser get url --json | jq -r '.url'
Common Pitfalls
This section covers frequent failure modes and how to debug them. Each pitfall includes the symptom you'll see, the underlying cause, and the fix.
Browser not found
Symptom: Error: Browser executable not found at default path or agent-browser: command not found
Cause: Chromium isn't installed, or agent-browser CLI isn't on PATH
Fix:
# Install Chromium
agent-browser install
# For Linux, install system dependencies
agent-browser install --with-deps
# Verify installation
agent-browser --version
Element ref becomes stale after page change
Symptom: Error: Element @e2 not found even though you just used it, or clicks fail silently
Cause: Element refs change when the DOM updates (navigation, page reload, dynamic content). Refs are only valid for the current DOM snapshot.
Fix:
# Always re-snapshot after major page changes
agent-browser click @e1 # Click submit
agent-browser wait 2000 # Wait for navigation
agent-browser snapshot -i --json # Get fresh refs
agent-browser click @e1 # Use new ref (was @e1 before? Verify!)
Prevention: In scripts, capture fresh refs after each navigation or dynamic load.
Session conflicts and port binding
Symptom: Error: Session "default" already in use or Port 9222 already in use
Cause: Browser session is still running from a previous command or crashed script, preventing a new session from starting
Fix:
# List active sessions
agent-browser session list
# Kill a specific session
agent-browser session kill my-session
# Use unique session names to avoid conflicts
agent-browser --session upload-$(date +%s) open https://app.com
# Or set environment variable once
export AGENT_BROWSER_SESSION=task-$(date +%s)
agent-browser open https://example.com
agent-browser close
Clicks and interactions don't work on hidden elements
Symptom: agent-browser click @e5 succeeds but nothing happens, or element is invisible in screenshot
Cause: Element is outside viewport, hidden by CSS (display: none, visibility: hidden), or covered by another element. Accessibility tree may show it, but browser can't interact with it.
Fix:
# Check visibility before interaction
agent-browser is visible @e5
# Scroll to make element visible
agent-browser scroll down 500
# Take screenshot to visually verify
agent-browser screenshot current-state.png
# Try parent element if child is deeply nested
agent-browser click @e4 # Click parent button instead
Prevention: Always take screenshots before and after critical interactions to verify visual state.
Timeout waiting for navigation or element
Symptom: Error: Timeout waiting for element or script hangs indefinitely after clicking a link
Cause:
- Element doesn't exist or appears after longer than expected delay
- Navigation is stuck (network error, page not loading)
- Element selector is wrong
Fix:
# Explicit wait before checking for element
agent-browser wait 2000 # Wait 2 seconds
agent-browser snapshot -i --json # Check if element exists
# Wait for specific element to appear
agent-browser wait @e3
# Increase timeout with explicit waits in script
agent-browser click @submit
sleep 3 # Shell script sleep
agent-browser snapshot -i
# Check page title to verify navigation worked
agent-browser get title
Prevention: Add explicit wait commands after form submissions or link clicks that cause navigation.
Snapshot is empty or missing expected elements
Symptom: agent-browser snapshot -i --json returns [] or very few elements, but you see them visually in screenshot
Cause:
- Page content is rendered by JavaScript (React, Vue, etc.) that hasn't completed
- Elements lack accessible labels or roles
-iflag filters too aggressively
Fix:
# Wait for content to render
agent-browser wait 2000
agent-browser snapshot -i
# Remove interactive-only filter to see all elements
agent-browser snapshot -c # Compact but includes all elements
# Increase tree depth to find nested elements
agent-browser snapshot -d 5
# Use full snapshot to diagnose
agent-browser snapshot --json | jq . | less
# Check page title to verify page loaded
agent-browser get title
Prevention: Add wait delays after opening pages with heavy JavaScript rendering.
Form fill doesn't work or text is partial
Symptom: Text appears to be cut off, only first few characters typed, or field stays empty
Cause:
- Field wasn't focused before typing (race condition)
- Field has input validation or character limit
- JavaScript event handlers expect specific typing speed or blur event
Fix:
# Always focus explicitly before filling
agent-browser focus @e2
agent-browser wait 500 # Let field settle
agent-browser fill @e2 "complete@example.com"
# For fields with slow JS event handling
agent-browser type @e2 "first part"
agent-browser wait 500
agent-browser type @e2 "second part"
# Verify what was actually entered
agent-browser get value @e2
Prevention: After any form fill, immediately verify the value was entered correctly with get value.
Network requests fail or external API timeouts
Symptom: Page loads but data doesn't populate, or integration tests fail when hitting real APIs
Cause:
- External API is slow or unreachable
- CORS errors (browser blocks requests to different domain)
- Missing authentication headers
- Network is isolated (testing environment)
Fix:
# Set custom headers for auth
agent-browser --headers '{"Authorization": "Bearer token123"}' open https://api.example.com
# Mock API responses instead
agent-browser network route "**/api/data" --body '{"result":"mocked"}'
# Add explicit wait for API response
agent-browser wait 3000 # Give API time to respond
agent-browser snapshot -i
# Fallback: Use `--headed` to see error messages
agent-browser --headed open https://app.com
agent-browser click @e1 # See browser console errors
Prevention: For testing, use network route to mock external APIs. For production, verify connectivity before running automation.
Browser crashes or becomes unresponsive
Symptom: agent-browser command hangs or process crashes with segmentation fault; Chrome window closes unexpectedly
Cause:
- Out of memory (too many long-running sessions or screenshots)
- Invalid CDP connection
- Rare Chromium/Playwright crash
- Insufficient system resources
Fix:
# Restart with fresh session
agent-browser session kill all-sessions # Or specify a name
# Use lightweight mode with limited resources
agent-browser --session lite open https://minimal-site.com
# Reduce screenshot frequency
# Instead of: agent-browser screenshot after every click
# Do: agent-browser screenshot only on failure
# Check system resources
top -n 1 | head -20
# Run with --debug for crash diagnostics
agent-browser --debug open https://example.com 2>&1 | tail -50
Prevention:
- Close sessions explicitly:
agent-browser close - Use unique session names for long-lived tasks
- Monitor memory in long automation scripts
Element ref is ambiguous (@e1 appears for multiple elements)
Symptom: Multiple elements resolve to the same @e number, clicking @e1 affects the wrong element
Cause: Accessibility tree groups similar elements (buttons, links) and agent-browser assigns refs based on order, not uniqueness
Fix:
# Use full snapshot to understand structure
agent-browser snapshot -d 3 --json > structure.json
cat structure.json | jq . # Examine hierarchy
# Click using find by text if available
agent-browser find text "Specific Button Text" click
# Use CSS selector if accessibility tree isn't clear
agent-browser find selector "button.submit-btn" click
# Take screenshot and count visually
agent-browser screenshot debug.png
# Then manually map which @eN corresponds to which button
# Use context to disambiguate
agent-browser get text @e1 # Check what @e1 actually is
Prevention: Always verify element identity by checking its text or taking screenshots before critical interactions.
Advanced Features
CDP Connection
Connect to an existing Chrome instance:
# Start Chrome with debugging
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222
# Connect agent-browser
agent-browser --cdp 9222 snapshot
Video Recording
# Start recording
agent-browser record start recording.webm https://example.com
# Perform actions...
agent-browser click @e1
agent-browser fill @e2 "test"
# Stop and save
agent-browser record stop
Network Interception
# Block requests to URL pattern
agent-browser network route "**/analytics/*" --abort
# Mock API response
agent-browser network route "**/api/user" --body '{"name": "Test"}'
# View captured requests
agent-browser network requests
Custom Headers
# Set auth headers
agent-browser --headers '{"Authorization": "Bearer token123"}' open https://api.example.com
Browser Extensions
# Load extension
agent-browser --extension /path/to/extension open https://example.com
AI Agent Patterns
Form Automation
#!/bin/bash
# Login automation script
agent-browser open https://app.com/login
# Get form elements
agent-browser snapshot -i --json > elements.json
# Fill credentials
agent-browser fill @e2 "user@example.com"
agent-browser fill @e3 "password123"
# Submit
agent-browser click @e4
# Wait for navigation
agent-browser wait 2000
# Verify login
agent-browser get url --json | jq -r '.url'
Data Extraction
#!/bin/bash
# Scrape product data
agent-browser open https://shop.com/products
# Get page content
agent-browser snapshot --json > snapshot.json
# Extract specific element text
agent-browser get text '[data-testid="price"]' --json
# Screenshot for verification
agent-browser screenshot products.png
Multi-Page Workflow
#!/bin/bash
SESSION="checkout-flow"
# Step 1: Add to cart
agent-browser --session $SESSION open https://shop.com/product/123
agent-browser --session $SESSION click @add-to-cart-button
agent-browser --session $SESSION wait 1000
# Step 2: Go to checkout
agent-browser --session $SESSION open https://shop.com/checkout
agent-browser --session $SESSION snapshot -i
# Step 3: Fill shipping
agent-browser --session $SESSION fill @shipping-name "John Doe"
agent-browser --session $SESSION fill @shipping-address "123 Main St"
# Cleanup
agent-browser --session $SESSION close
Options Reference
| Option | Description |
|--------|-------------|
| --session <name> | Isolated browser session |
| --json | JSON output format |
| --headed | Show browser window (not headless) |
| --cdp <port> | Connect via Chrome DevTools Protocol |
| --headers <json> | HTTP headers for requests |
| --proxy <url> | Proxy server |
| --executable-path <path> | Custom browser executable |
| --extension <path> | Load browser extension |
| --full, -f | Full page screenshot |
| --debug | Debug output |
Environment Variables
| Variable | Description |
|----------|-------------|
| AGENT_BROWSER_SESSION | Default session name |
| AGENT_BROWSER_EXECUTABLE_PATH | Custom browser path |
| AGENT_BROWSER_STREAM_PORT | WebSocket streaming port |
Examples
Example 1: Simple Web Scraping
User request:
"Scrape the product title and price from https://example.com/product/123"
Workflow:
# 1. Navigate to page
agent-browser open https://example.com/product/123
# 2. Get accessibility snapshot
agent-browser snapshot -i --json > page.json
# 3. Identify elements (example output):
# @e5 heading "Premium Widget Pro"
# @e12 text "$49.99"
# 4. Extract data
agent-browser get text @e5 --json # Returns: {"text": "Premium Widget Pro"}
agent-browser get text @e12 --json # Returns: {"text": "$49.99"}
Expected output:
{
"title": "Premium Widget Pro",
"price": "$49.99"
}
Time: 2-3 seconds
Example 2: Form Submission with Verification
User request:
"Fill out the contact form at https://example.com/contact and verify submission"
Workflow:
# 1. Navigate and snapshot
agent-browser open https://example.com/contact
agent-browser snapshot -i
# Example snapshot output:
# @e1 textbox "Name"
# @e2 textbox "Email"
# @e3 textbox "Message"
# @e4 button "Send"
# 2. Fill form fields
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "john@example.com"
agent-browser fill @e3 "I have a question about your product"
# 3. Submit
agent-browser click @e4
# 4. Wait for response
agent-browser wait 2000
# 5. Verify success
agent-browser snapshot -i | grep "Thank you"
# Or take screenshot for manual verification
agent-browser screenshot success.png
Expected outcome:
- Form submitted successfully
- Page shows confirmation message
- Screenshot saved showing success state
Time: 4-6 seconds
Example 3: Multi-Page Workflow with Session
User request:
"Add item to cart, proceed to checkout, and fill shipping info"
Workflow:
# Use session for state persistence across commands
SESSION="checkout-flow-$(date +%s)"
# 1. Add to cart
agent-browser --session $SESSION open https://shop.example.com/product/widget
agent-browser --session $SESSION snapshot -i | grep "Add to Cart"
agent-browser --session $SESSION click @add-to-cart # Assuming @add-to-cart ref found
agent-browser --session $SESSION wait 1000 # Wait for cart update
# 2. Go to checkout
agent-browser --session $SESSION open https://shop.example.com/checkout
agent-browser --session $SESSION snapshot -i
# 3. Fill shipping form (refs from snapshot)
agent-browser --session $SESSION fill @shipping-name "Jane Smith"
agent-browser --session $SESSION fill @shipping-address "456 Oak Ave"
agent-browser --session $SESSION fill @shipping-city "Portland"
agent-browser --session $SESSION select @shipping-state "OR"
agent-browser --session $SESSION fill @shipping-zip "97201"
# 4. Take screenshot before submission
agent-browser --session $SESSION screenshot checkout-filled.png
# 5. Cleanup
agent-browser --session $SESSION close
Expected outcome:
- Item added to cart successfully
- Checkout form filled with shipping details
- Screenshot saved showing completed form
- Session cleaned up
Time: 8-12 seconds
Example 4: Login and Data Extraction
User request:
"Log into dashboard, navigate to reports, and extract table data"
Workflow:
SESSION="dashboard-scrape"
# 1. Login
agent-browser --session $SESSION open https://app.example.com/login
agent-browser --session $SESSION snapshot -i
# Fill login form
agent-browser --session $SESSION fill @email "user@example.com"
agent-browser --session $SESSION fill @password "secretpass"
agent-browser --session $SESSION click @submit
agent-browser --session $SESSION wait 2000
# 2. Navigate to reports
agent-browser --session $SESSION open https://app.example.com/reports
agent-browser --session $SESSION wait 1000
# 3. Extract table data
agent-browser --session $SESSION snapshot --json > reports.json
# Parse JSON to extract table rows
# @e20 table
# @e21 row "Q1 2024, $50,000, 15% growth"
# @e22 row "Q2 2024, $62,000, 24% growth"
agent-browser --session $SESSION get text @e20 --json
# 4. Cleanup
agent-browser --session $SESSION close
Expected output:
{
"reports": [
{"quarter": "Q1 2024", "revenue": "$50,000", "growth": "15%"},
{"quarter": "Q2 2024", "revenue": "$62,000", "growth": "24%"}
]
}
Time: 6-10 seconds
Troubleshooting
Issue: Element Reference Not Found
Symptoms:
Error: Element @e5 not found
Cause: Element ref changed after page update or snapshot was stale
Solution:
# 1. Get fresh snapshot
agent-browser snapshot -i
# 2. Verify element ref in output
# Look for expected element in snapshot output
# 3. If element doesn't appear, try without -i flag
agent-browser snapshot # Shows all elements, not just interactive
# 4. If still missing, check if page loaded completely
agent-browser wait 2000 # Wait for page to settle
agent-browser snapshot -i
Issue: Timeout Waiting for Navigation
Symptoms:
Error: Navigation timeout after 30000ms
Cause: Page takes too long to load or network issues
Solution:
# 1. Check if page is accessible
agent-browser --headed open https://example.com # Visual debugging
# 2. Increase wait time after navigation
agent-browser open https://slow-site.com
agent-browser wait 5000 # Wait 5 seconds
# 3. Check network requests
agent-browser network requests # See what's loading
# 4. Use CDP connection for better control
agent-browser --cdp 9222 open https://example.com
Issue: Click Not Working
Symptoms: Element click command runs but nothing happens
Cause: Element not visible, disabled, or covered by another element
Solution:
# 1. Check if element is visible
agent-browser is visible @e3
# Returns: true/false
# 2. Check if element is enabled
agent-browser is enabled @e3
# 3. Try scrolling to element first
agent-browser scroll down 500
agent-browser wait 500
agent-browser click @e3
# 4. Take screenshot to see page state
agent-browser screenshot debug.png
# 5. Try double-click instead
agent-browser dblclick @e3
Issue: Form Input Not Registering
Symptoms: Text typed but form field remains empty
Cause: JavaScript framework requires special events or delays
Solution:
# 1. Focus element first
agent-browser focus @e2
agent-browser wait 200
# 2. Use fill instead of type (clears first)
agent-browser fill @e2 "text content"
# 3. Add delays between keystrokes
agent-browser type @e2 "slow"
agent-browser wait 100
# 4. Try pressing Tab after filling to trigger validation
agent-browser fill @e2 "email@example.com"
agent-browser press Tab
Issue: Session Persistence Problems
Symptoms: Previous session state not available
Cause: Session wasn't properly named or browser closed unexpectedly
Solution:
# 1. Always use explicit session names
SESSION="my-workflow-$(date +%s)" # Unique session ID
agent-browser --session $SESSION open https://example.com
# 2. Check active sessions
agent-browser --session $SESSION get url
# If error, session doesn't exist
# 3. Don't reuse session names across runs
# Each workflow should have unique session ID
# 4. Cleanup sessions when done
agent-browser --session $SESSION close
Issue: Chromium Installation Failed
Symptoms:
Error: Failed to download Chromium
Cause: Network issues, disk space, or missing system dependencies
Solution:
# 1. Check disk space
df -h # Ensure >2GB free
# 2. Retry installation with verbose output
agent-browser install --debug
# 3. On Linux, install system dependencies first
agent-browser install --with-deps
# 4. Use custom browser if needed
agent-browser --executable-path /usr/bin/chromium open https://example.com
# 5. Verify installation
agent-browser --version
ls ~/.cache/ms-playwright/ # Check if Chromium exists
Issue: JSON Output Malformed
Symptoms: Can't parse JSON output from commands
Cause: Mixed text/JSON output or errors in JSON mode
Solution:
# 1. Use --json flag consistently
agent-browser get text @e5 --json # Not: agent-browser get text @e5
# 2. Redirect stderr to separate stream
agent-browser snapshot --json 2>errors.log >output.json
# 3. Validate JSON output
agent-browser snapshot --json | jq . # Will error if invalid
# 4. Check for error messages in output
agent-browser snapshot --json | grep -E '^{' | jq .
Best Practices
- Always snapshot first: Get element refs before interacting
- Use
--jsonfor parsing: Machine-readable output for scripts - Session isolation: Use
--sessionfor parallel workflows - Explicit waits: Add
waitcommands after navigation/clicks - Interactive snapshots: Use
-iflag to reduce noise - Verify state: Check element visibility before clicking
- Screenshot on failure: Capture state when automation fails