Agent Skills: Browser Interaction Protocol

Interactive browser automation using agent-browser. Use when navigating dynamic sites, authentication, clicking, typing, and complex state navigation. Do NOT use for simple read-only text extraction.

UncategorizedID: Git-Fg/thecattoolkit/browsing-web

Install this agent skill to your local

pnpm dlx add-skill https://github.com/Git-Fg/thecattoolkit/tree/HEAD/plugins/sys-browser/skills/browsing-web

Skill Files

Browse the full folder contents for browsing-web.

Download Skill

Loading file tree…

plugins/sys-browser/skills/browsing-web/SKILL.md

Skill Metadata

Name
browsing-web
Description
"Interactive browser automation using agent-browser. Use when navigating dynamic sites, authentication, clicking, typing, and complex state navigation. Do NOT use for simple read-only text extraction."

Browser Interaction Protocol

Core Loop (The Ref Pattern)

You interact with the browser using References (@refs) derived from snapshots, not CSS selectors.

  1. Navigate: agent-browser open "url"
  2. Snapshot: agent-browser snapshot -i (Gets accessibility tree with @e refs)
  3. Interact: agent-browser click @e1 (Uses ref from snapshot)

Critical Constraints

  1. Never Guess Selectors: You cannot guess @e1. You MUST run snapshot to see current refs.
  2. Interactive Only: Always use snapshot -i to filter non-interactive elements (saves tokens).
  3. Stateful: The browser persists between commands. You do not need to re-open.

Common Patterns

Navigation & extraction

agent-browser open "https://google.com"
agent-browser snapshot -i
# Output shows: [ref=e4] button "Search"
agent-browser fill @e2 "Claude Code"
agent-browser click @e4
agent-browser wait --load networkidle

Visual Verification

Only if structure is confusing:

agent-browser screenshot page.png