Agent Skills: Agentic Browser

A fast Rust-based headless browser automation CLI with Node.js fallback that enables AI agents to navigate, click, type, and snapshot pages via structured commands.

UncategorizedID: aiskillstore/marketplace/agent-browser

Install this agent skill to your local

pnpm dlx add-skill https://github.com/aiskillstore/marketplace/tree/HEAD/skills/inf-sh/agent-browser

Skill Files

Browse the full folder contents for agent-browser.

Download Skill

Loading file tree…

skills/inf-sh/agent-browser/SKILL.md

Skill Metadata

Name
agent-browser
Description
"Browser automation for AI agents via inference.sh. Navigate web pages, interact with elements using @e refs, take screenshots, record video. Capabilities: web scraping, form filling, clicking, typing, drag-drop, file upload, JavaScript execution. Use for: web automation, data extraction, testing, agent browsing, research. Triggers: browser, web automation, scrape, navigate, click, fill form, screenshot, browse web, playwright, headless browser, web agent, surf internet, record video"

Agentic Browser

Browser automation for AI agents via inference.sh. Uses Playwright under the hood with a simple @e ref system for element interaction.

Agentic Browser

Quick Start

# Install CLI
curl -fsSL https://cli.inference.sh | sh && infsh login

# Open a page and get interactive elements
infsh app run agent-browser --function open --input '{"url": "https://example.com"}' --session new

Core Workflow

Every browser automation follows this pattern:

  1. Open - Navigate to URL, get @e refs for elements
  2. Interact - Use refs to click, fill, drag, etc.
  3. Re-snapshot - After navigation/changes, get fresh refs
  4. Close - End session (returns video if recording)
# 1. Start session
RESULT=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com/login"
}')
SESSION_ID=$(echo $RESULT | jq -r '.session_id')
# Elements: @e1 [input] "Email", @e2 [input] "Password", @e3 [button] "Sign In"

# 2. Fill and submit
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
  "action": "fill", "ref": "@e1", "text": "user@example.com"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
  "action": "fill", "ref": "@e2", "text": "password123"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
  "action": "click", "ref": "@e3"
}'

# 3. Re-snapshot after navigation
infsh app run agent-browser --function snapshot --session $SESSION_ID --input '{}'

# 4. Close when done
infsh app run agent-browser --function close --session $SESSION_ID --input '{}'

Functions

| Function | Description | |----------|-------------| | open | Navigate to URL, configure browser (viewport, proxy, video recording) | | snapshot | Re-fetch page state with @e refs after DOM changes | | interact | Perform actions using @e refs (click, fill, drag, upload, etc.) | | screenshot | Take page screenshot (viewport or full page) | | execute | Run JavaScript code on the page | | close | Close session, returns video if recording was enabled |

Interact Actions

| Action | Description | Required Fields | |--------|-------------|-----------------| | click | Click element | ref | | dblclick | Double-click element | ref | | fill | Clear and type text | ref, text | | type | Type text (no clear) | text | | press | Press key (Enter, Tab, etc.) | text | | select | Select dropdown option | ref, text | | hover | Hover over element | ref | | check | Check checkbox | ref | | uncheck | Uncheck checkbox | ref | | drag | Drag and drop | ref, target_ref | | upload | Upload file(s) | ref, file_paths | | scroll | Scroll page | direction (up/down/left/right), scroll_amount | | back | Go back in history | - | | wait | Wait milliseconds | wait_ms | | goto | Navigate to URL | url |

Element Refs

Elements are returned with @e refs:

@e1 [a] "Home" href="/"
@e2 [input type="text"] placeholder="Search"
@e3 [button] "Submit"
@e4 [select] "Choose option"
@e5 [input type="checkbox"] name="agree"

Important: Refs are invalidated after navigation. Always re-snapshot after:

  • Clicking links/buttons that navigate
  • Form submissions
  • Dynamic content loading

Features

Video Recording

Record browser sessions for debugging or documentation:

# Start with recording enabled (optionally show cursor indicator)
SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "record_video": true,
  "show_cursor": true
}' | jq -r '.session_id')

# ... perform actions ...

# Close to get the video file
infsh app run agent-browser --function close --session $SESSION --input '{}'
# Returns: {"success": true, "video": <File>}

Cursor Indicator

Show a visible cursor in screenshots and video (useful for demos):

infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "show_cursor": true,
  "record_video": true
}'

The cursor appears as a red dot that follows mouse movements and shows click feedback.

Proxy Support

Route traffic through a proxy server:

infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "proxy_url": "http://proxy.example.com:8080",
  "proxy_username": "user",
  "proxy_password": "pass"
}'

File Upload

Upload files to file inputs:

infsh app run agent-browser --function interact --session $SESSION --input '{
  "action": "upload",
  "ref": "@e5",
  "file_paths": ["/path/to/file.pdf"]
}'

Drag and Drop

Drag elements to targets:

infsh app run agent-browser --function interact --session $SESSION --input '{
  "action": "drag",
  "ref": "@e1",
  "target_ref": "@e2"
}'

JavaScript Execution

Run custom JavaScript:

infsh app run agent-browser --function execute --session $SESSION --input '{
  "code": "document.querySelectorAll(\"h2\").length"
}'
# Returns: {"result": "5", "screenshot": <File>}

Deep-Dive Documentation

| Reference | Description | |-----------|-------------| | references/commands.md | Full function reference with all options | | references/snapshot-refs.md | Ref lifecycle, invalidation rules, troubleshooting | | references/session-management.md | Session persistence, parallel sessions | | references/authentication.md | Login flows, OAuth, 2FA handling | | references/video-recording.md | Recording workflows for debugging | | references/proxy-support.md | Proxy configuration, geo-testing |

Ready-to-Use Templates

| Template | Description | |----------|-------------| | templates/form-automation.sh | Form filling with validation | | templates/authenticated-session.sh | Login once, reuse session | | templates/capture-workflow.sh | Content extraction with screenshots |

Examples

Form Submission

SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com/contact"
}' | jq -r '.session_id')

# Get elements: @e1 [input] "Name", @e2 [input] "Email", @e3 [textarea], @e4 [button] "Send"

infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "John Doe"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e2", "text": "john@example.com"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e3", "text": "Hello!"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "click", "ref": "@e4"}'

infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'

Search and Extract

SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://google.com"
}' | jq -r '.session_id')

infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "weather today"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "press", "text": "Enter"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "wait", "wait_ms": 2000}'

infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'

Screenshot with Video

SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "record_video": true
}' | jq -r '.session_id')

# Take full page screenshot
infsh app run agent-browser --function screenshot --session $SESSION --input '{
  "full_page": true
}'

# Close and get video
RESULT=$(infsh app run agent-browser --function close --session $SESSION --input '{}')
echo $RESULT | jq '.video'

Sessions

Browser state persists within a session. Always:

  1. Start with --session new on first call
  2. Use returned session_id for subsequent calls
  3. Close session when done

Related Skills

# Web search (for research + browse)
npx skills add inference-sh/skills@web-search

# LLM models (analyze extracted content)
npx skills add inference-sh/skills@llm-models

Documentation